c# HashSet init takes too long - c#

I'm dealing with the fact that I need to init a HashSet with a set of elements but without any kind of comparation class.
After the init, any element added to the HashSet need to be passed with a comparator.
How can I accomplish it?
Now I have this:
HashSet<Keyword> set = new HashSet<Keyword>(new KeyWordComparer());
The problem is that the init takes to long and there's no necessity in applying the comparation.
KeywordComparer Class:
class KeyWordComparer : EqualityComparer<Keyword>
{
public override bool Equals(Keyword k1, Keyword k2)
{
int equals = 0;
int i = 0;
int j = 0;
// based on sorted ids
while (i < k1._lista_modelos.Count && j < k2._lista_modelos.Count)
{
if (k1._lista_modelos[i] < k2._lista_modelos[j])
{
i++;
}
else if (k1._lista_modelos[i] > k2._lista_modelos[j])
{
j++;
}
else
{
equals++;
i++;
j++;
}
}
return equals >= 8;
}
public override int GetHashCode(Keyword keyword)
{
return 0;//notice that using the same hash for all keywords gives you an O(n^2) time complexity though.
}
}
Note: This is a follow-up question to c# comparing list of IDs.

every keyword has 20 IDs, so when I want to add a new Keyword to the HashSet, the KeywordComparer check that the new one does not have more than 8 ID's repeated with any keyword of the HashSet.In such case, new keyword is not included, if not, it's included.
Collecting these keywords is not a job for a hash set here. A hash set is generally not suited for items which depend on other elements of the set. You should only use it for things where a useful hash can be calculated for every item. Since it depends on the existing set of items whether a new item gets added to your set, this is totally the wrong tool.
Here’s an attempt to solve this problem according to your short description of what you actually want to do. Here, we are simply collecting the keywords in a list. In order to verify that they may be added, we use an addition hash set to collect the ids of the keywords. That way, we can quickly check for a new item, whether 8 or more of its ids are already contained within the list of keywords.
var keywords = new List<Keyword>();
var selectedIds = new HashSet<int>(); // I’m assuming that the ids are ints here
foreach (var keyword in GetListOfAllKeywords())
{
// count the number of keyword ids that are already in the selectedIds set
var duplicateIdCount = keyword.Ids.Count(id => selectedIds.Contains(id));
if (duplicateIdCount <= 8)
{
// less or equal to 8 ids are already selected, so add this keyword
keywords.Add(keyword);
// and collect all the keyword’s ids
selectedIds.AddRange(keyword.Ids);
}
}

If I stay away from the fact if using the HashSet is the right type for the job at hand or if your Comparer even makes sense implementing a proper GetHashCode does seem to make a huge difference.
Here is an example implementation, based on an answer from Marc Gravell:
class KeyWordComparer : EqualityComparer<Keyword>
{
// omitted your Equals implentaton for brevity
public override int GetHashCode(Keyword keyword)
{
//return 0; // this was the original
// Marc Gravell https://stackoverflow.com/a/371348/578411
int hash = 13;
// not sure what is up with the only 8 ID's but I take that as a given
for(var i=0; i < Math.Min(keyword._lista_modelos.Count, 8) ; i++)
{
hash = (hash * 7) + keyword._lista_modelos[i].GetHashCode();
}
return hash;
}
}
When I run this in LinqPad with this test rig
Random randNum = new Random();
var kc = new KeyWordComparer();
HashSet<Keyword> set = new HashSet<Keyword>(kc);
var sw = new Stopwatch();
sw.Start();
for(int i =0 ; i< 10000; i++)
{
set.Add(new Keyword(Enumerable
.Repeat(0, randNum.Next(1,10))
.Select(ir => randNum.Next(1, 256)).ToList()));
}
sw.Stop();
sw.ElapsedMilliseconds.Dump("ms");
this is what I measure:
7 ms for 10,000 items
If I switch back to your return 0; implementation for GetHashCodeI measure
4754 ms for 10,000 items
If I increase the testloop to insert 100,000 items the better GetHashCode still completes in 224 ms on my box. I didn't wait for your implementation to finish.
So if anything implement a proper GetHashCode method.

Related

change values in arrays until all values are different

i am trying to generate 5 random numbers in an array and output them, however i don't want 2 values to be the same, what do i need to add to this code to prevent this?
using System;
public class Program
{
public static void Main()
{
int count = 0;
int Randomnum=0;
int[] num = new int[5];
Random r = new Random();
while(count < 5){
Randomnum= r.Next(1,10);
num[count]=Randomnum;
count = count+ 1;
}
foreach(var entry in num)
{
Console.WriteLine(entry);
}
}
}
You could get your full set using Enumerable.Range, order them by a random value and get top 5. ie:
var numberSet = Enumerable.Range(1,10);
var randomSet = numberSet.OrderBy(s => Guid.NewGuid()).Take(5);
foreach (var entry in randomSet)
{
Console.WriteLine(entry);
}
Your current implementation could be edited so that it adds the new random value (and increments the counter) only if num does not already contain the value.
Your variables are defined as follows:
int count = 0;
int randomNum = 0;
int[] num = new int[5];
Random r = new Random();
One way of checking whether or not num contains the new value is by using Array.IndexOf(). This method returns the index at which the value you provide is found in the array that you provide. If the value you provide is not found in the array, the method will return -1.
(Note: Array.IndexOf() specifically returns the lower bound of the array minus 1 when no match is found. Seeing as you populate num starting at index 0, the return value is therefore -1 in your scenario. More about the computation of an array's lower bound here).
The implementation of your while loop could thus be adjusted to:
while (count < 5)
{
randomNum = r.Next(1, 10);
if (Array.IndexOf(num, randomNum) < 0)
{
num[count] = randomNum;
count += 1;
}
}
An alternative to using Array.IndexOf() is to use Enumerable.Contains() from the System.Linq namespace. I find it to be more readable, so I just thought I'd mention it.
//using System.Linq;
while (count < 5)
{
randomNum = r.Next(1, 10);
if (!num.Contains(randomNum))
{
num[count] = randomNum;
count += 1;
}
}
That being said, you may want to consider using a HashSet rather than an array for this scenario. A HashSet can only contain distinct values, which is what you want to achieve.
HashSet's .Add() method actually checks whether your HashSet already contains the value you are trying to add. If it does, the value will not be added again.
Due to this behavior, you can call .Add() for every random value that you generate, without manually having to check for existence beforehand.
Another beneficial side effect of this is that your count and randomNum variables are no longer necessary.
Using a HashSet rather than an array, your code (prior to the code that prints the result to the console) could be implemented as follows:
//using System.Collections.Generic;
HashSet<int> num = new();
Random r = new Random();
while (num.Count < 5)
{
num.Add(r.Next(1, 10));
}
Example fiddle with all three implementations here.

Combination of a list of lists so that each combination has unique elements

Ok so, I have a list of lists, like the title says and I want to make combinations of k lists in which every list has different elements than the rest.
Example:
I have the following list of lists:
{ {1,2,3} , {1,11} , {2,3,6} , {6,5,7} , {4,8,9} }
A valid 3-sized combination of these lists could be:
{ {1,11}, {4,8,9} ,{6,5,7} }
This is only ONE of the valid combinations, what I want to return is a list of all the valid combinations of K lists.
An invalid combination would be:
{ {1,11} ,{2, 3, 6}, {6, 5, 7} }
because the element 6 is present in the second and third list.
I already have a code that does this but it just finds all possible combinations and checks if they are valid before addding it to a final result list. As this list of lists is quite large (153 lists) when K gets bigger, the time taken is ridiculously big too (at K = 5 it takes me about 10 minutes.)
I want to see if there's an efficient way of doing this.
Below is my current code (the lists I want to combine are attribute of the class Item):
public void recursiveComb(List<Item> arr, int len, int startPosition, Item[] result)
{
if (len == 0)
{
if (valid(result.ToList()))
{
//Here I add the result to final list
//valid is just a function that checks if any list has repeated elements in other
}
return;
}
for (int i = startPosition; i <= arr.Count - len; i++)
{
result[result.Length - len] = arr[i];
recursiveComb(arr, len - 1, i + 1, result);
}
}
Use a HashSet
https://msdn.microsoft.com/en-us/library/bb359438(v=vs.110).aspx
to keep track of distinct elements as you build the output from the candidates in the input list of lists/tuples
accumulate an output list of non overlapping tuples by Iterating across the input list of tuples and evaluate each tuple as a candidate as follows:
For each input tuple, insert each tuple element into the HashSet. If the element you are trying to insert is already in the set, then the tuple fails the constraint and should be skipped, otherwise the tuple elements are all distinct from ones already in the output.
The hashset object effectively maintains a registry of distinct items in your accepted list of tuples.
If I understood your code correctly then, you are passing each list<int> from your input to recursiveComb() function. which look like this
for(int i = 0; i < inputnestedList.Count; i++)
{
recursiveComb();
// Inside of recursiveComb() you are using one more for loop with recursion.
// This I observed from your first parameter i.e. List<int>
}
Correct me if I am wrong
This leads to time complexity more than O(n^2)
Here is my simplest solution, with two forloops without recursion.
List<List<int>> x = new List<List<int>>{ new List<int>(){1,2,3} , new List<int>(){1,11} , new List<int>(){2,3,6} , new List<int>(){6,5,7} , new List<int>(){4,8,9} };
List<List<int>> result = new List<List<int>>();
var watch = Stopwatch.StartNew();
for (int i = 0; i < x.Count;i++)
{
int temp = 0;
for (int j = 0; j < x.Count; j++)
{
if (i != j && x[i].Intersect(x[j]).Any())
temp++;
}
// This condition decides, that elements of ith list are available in other lists
if (temp <= 1)
result.Add(x[i]);
}
watch.Stop();
var elapsedMs = watch.Elapsed.TotalMilliseconds;
Console.WriteLine(elapsedMs);
Now when I print execution time then output is
Execution Time: 11.4628
Check execution time of your code. If execution time of your code is higher than mine, then you can consider it as efficient code
Proof of code: DotNetFiddler
Happy coding
If I understood your problem correctly then this will work:
/// <summary>
/// Get Unique List sets
/// </summary>
/// <param name="sets"></param>
/// <returns></returns>
public List<List<T>> GetUniqueSets<T>(List<List<T>> sets )
{
List<List<T>> cache = new List<List<T>>();
for (int i = 0; i < sets.Count; i++)
{
// add to cache if it's empty
if (cache.Count == 0)
{
cache.Add(sets[i]);
continue;
}
else
{
//check whether current item is in the cache and also whether current item intersects with any of the items in cache
var cacheItems = from item in cache where (item != sets[i] && item.Intersect(sets[i]).Count() == 0) select item;
//if not add to cache
if (cacheItems.Count() == cache.Count)
{
cache.Add(sets[i]);
}
}
}
return cache;
}
Tested, it's fast and took 00:00:00.0186033 for finding sets.

How do I check for duplicate answers in this array? c#

Sorry for the newbie question. Could someone help me out? Simple array here. What's the best/easiest method to check all the user input is unique and not duplicated? Thanks
private void btnNext_Click(object sender, EventArgs e)
{
string[] Numbers = new string[5];
Numbers[0] = txtNumber1.Text;
Numbers[1] = txtNumber2.Text;
Numbers[2] = txtNumber3.Text;
Numbers[3] = txtNumber4.Text;
Numbers[4] = txtNumber5.Text;
foreach (string Result in Numbers)
{
lbNumbers.Items.Add(Result);
}
txtNumber1.Clear();
txtNumber2.Clear();
txtNumber3.Clear();
txtNumber4.Clear();
txtNumber5.Clear();
}
}
}
I should have added I need to check to happen before the numbers are output. Thanks
One simple approach is via LINQ:
bool allUnique = Numbers.Distinct().Count() == Numbers.Length;
Another approach is using a HashSet<string>:
var set = new HashSet<string>(Numbers);
if (set.Count == Numbers.Count)
{
// all unique
}
or with Enumerable.All:
var set = new HashSet<string>();
// HashSet.Add returns a bool if the item was added because it was unique
bool allUnique = Numbers.All(text=> set.Add(text));
Enunmerable.All is more efficient when the sequence is very large since it does not create the set completely but one after each other and will return false as soon as it detects a duplicate.
Here's a demo of this effect: http://ideone.com/G48CYv
HashSet constructor memory consumption: 50 MB, duration: 00:00:00.2962615
Enumerable.All memory consumption: 0 MB, duration: 00:00:00.0004254
msdn
The HashSet<T> class provides high-performance set operations.
A set is a collection that contains no duplicate elements, and whose
elements are in no particular order.
The easiest way, in my opinion, would be to insert all values inside a set and then check if its size is equal to the array's size. A set can't contain duplicate values, so if any value is duplicate, it won't be inserted into the set.
This is also OK in complexity if you don't have millions of values, because insertion in a set is done in O(logn) time, so total check time will be O(nlogn).
If you want something optimal in complexity, you can do this in O(n) time by going through the array, and putting each value found into a hash map while incrementing its value: if value doesn't exist in set, you add it with count = 1. If it does exist, you increment its count.
Then, you go through the hash map and check that all values have a count of one.
If you are just trying to make sure that your listbox doesn't have dups then use this:
if(!lbNumbers.Items.Contains(Result))
lbNumbers.Items.Add(Result);
What about this:
public bool arrayContainsDuplicates(string[] array) {
for (int i = 0; i < array.Length - 2; i++) {
for (int j = i + 1; j < array.Length - 1; j++) {
if (array[i] == array[j]) return true;
}
}
return false;
}

Random playlist algorithm

I need to create a list of numbers from a range (for example from x to y) in a random order so that every order has an equal chance.
I need this for a music player I write in C#, to create play lists in a random order.
Any ideas?
Thanks.
EDIT: I'm not interested in changing the original list, just pick up random indexes from a range in a random order so that every order has an equal chance.
Here's what I've wrriten so far:
public static IEnumerable<int> RandomIndexes(int count)
{
if (count > 0)
{
int[] indexes = new int[count];
int indexesCountMinus1 = count - 1;
for (int i = 0; i < count; i++)
{
indexes[i] = i;
}
Random random = new Random();
while (indexesCountMinus1 > 0)
{
int currIndex = random.Next(0, indexesCountMinus1 + 1);
yield return indexes[currIndex];
indexes[currIndex] = indexes[indexesCountMinus1];
indexesCountMinus1--;
}
yield return indexes[0];
}
}
It's working, but the only problem of this is that I need to allocate an array in the memory in the size of count. I'm looking for something that dose not require memory allocation.
Thanks.
This can actually be tricky if you're not careful (i.e., using a naïve shuffling algorithm). Take a look at the Fisher-Yates/Knuth shuffle algorithm for proper distribution of values.
Once you have the shuffling algorithm, the rest should be easy.
Here's more detail from Jeff Atwood.
Lastly, here's Jon Skeet's implementation and description.
EDIT
I don't believe that there's a solution that satisfies your two conflicting requirements (first, to be random with no repeats and second to not allocate any additional memory). I believe you may be prematurely optimizing your solution as the memory implications should be negligible, unless you're embedded. Or, perhaps I'm just not smart enough to come up with an answer.
With that, here's code that will create an array of evenly distributed random indexes using the Knuth-Fisher-Yates algorithm (with a slight modification). You can cache the resulting array, or perform any number of optimizations depending on the rest of your implementation.
private static int[] BuildShuffledIndexArray( int size ) {
int[] array = new int[size];
Random rand = new Random();
for ( int currentIndex = array.Length - 1; currentIndex > 0; currentIndex-- ) {
int nextIndex = rand.Next( currentIndex + 1 );
Swap( array, currentIndex, nextIndex );
}
return array;
}
private static void Swap( IList<int> array, int firstIndex, int secondIndex ) {
if ( array[firstIndex] == 0 ) {
array[firstIndex] = firstIndex;
}
if ( array[secondIndex] == 0 ) {
array[secondIndex] = secondIndex;
}
int temp = array[secondIndex];
array[secondIndex] = array[firstIndex];
array[firstIndex] = temp;
}
NOTE: You can use ushort instead of int to half the size in memory as long as you don't have more than 65,535 items in your playlist. You could always programmatically switch to int if the size exceeds ushort.MaxValue. If I, personally, added more than 65K items to a playlist, I wouldn't be shocked by increased memory utilization.
Remember, too, that this is a managed language. The VM will always reserve more memory than you are using to limit the number of times it needs to ask the OS for more RAM and to limit fragmentation.
EDIT
Okay, last try: we can look to tweak the performance/memory trade off: You could create your list of integers, then write it to disk. Then just keep a pointer to the offset in the file. Then every time you need a new number, you just have disk I/O to deal with. Perhaps you can find some balance here, and just read N-sized blocks of data into memory where N is some number you're comfortable with.
Seems like a lot of work for a shuffle algorithm, but if you're dead-set on conserving memory, then at least it's an option.
If you use a maximal linear feedback shift register, you will use O(1) of memory and roughly O(1) time. See here for a handy C implementation (two lines! woo-hoo!) and tables of feedback terms to use.
And here is a solution:
public class MaximalLFSR
{
private int GetFeedbackSize(uint v)
{
uint r = 0;
while ((v >>= 1) != 0)
{
r++;
}
if (r < 4)
r = 4;
return (int)r;
}
static uint[] _feedback = new uint[] {
0x9, 0x17, 0x30, 0x44, 0x8e,
0x108, 0x20d, 0x402, 0x829, 0x1013, 0x203d, 0x4001, 0x801f,
0x1002a, 0x2018b, 0x400e3, 0x801e1, 0x10011e, 0x2002cc, 0x400079, 0x80035e,
0x1000160, 0x20001e4, 0x4000203, 0x8000100, 0x10000235, 0x2000027d, 0x4000016f, 0x80000478
};
private uint GetFeedbackTerm(int bits)
{
if (bits < 4 || bits >= 28)
throw new ArgumentOutOfRangeException("bits");
return _feedback[bits];
}
public IEnumerable<int> RandomIndexes(int count)
{
if (count < 0)
throw new ArgumentOutOfRangeException("count");
int bitsForFeedback = GetFeedbackSize((uint)count);
Random r = new Random();
uint i = (uint)(r.Next(1, count - 1));
uint feedback = GetFeedbackTerm(bitsForFeedback);
int valuesReturned = 0;
while (valuesReturned < count)
{
if ((i & 1) != 0)
{
i = (i >> 1) ^ feedback;
}
else {
i = (i >> 1);
}
if (i <= count)
{
valuesReturned++;
yield return (int)(i-1);
}
}
}
}
Now, I selected the feedback terms (badly) at random from the link above. You could also implement a version that had multiple maximal terms and you select one of those at random, but you know what? This is pretty dang good for what you want.
Here is test code:
static void Main(string[] args)
{
while (true)
{
Console.Write("Enter a count: ");
string s = Console.ReadLine();
int count;
if (Int32.TryParse(s, out count))
{
MaximalLFSR lfsr = new MaximalLFSR();
foreach (int i in lfsr.RandomIndexes(count))
{
Console.Write(i + ", ");
}
}
Console.WriteLine("Done.");
}
}
Be aware that maximal LFSR's never generate 0. I've hacked around this by returning the i term - 1. This works well enough. Also, since you want to guarantee uniqueness, I ignore anything out of range - the LFSR only generates sequences up to powers of two, so in high ranges, it will generate wost case 2x-1 too many values. These will get skipped - that will still be faster than FYK.
Personally, for a music player, I wouldn't generate a shuffled list, and then play that, then generate another shuffled list when that runs out, but do something more like:
IEnumerable<Song> GetSongOrder(List<Song> allSongs)
{
var playOrder = new List<Song>();
while (true)
{
// this step assigns an integer weight to each song,
// corresponding to how likely it is to be played next.
// in a better implementation, this would look at the total number of
// songs as well, and provide a smoother ramp up/down.
var weights = allSongs.Select(x => playOrder.LastIndexOf(x) > playOrder.Length - 10 ? 50 : 1);
int position = random.Next(weights.Sum());
foreach (int i in Enumerable.Range(allSongs.Length))
{
position -= weights[i];
if (position < 0)
{
var song = allSongs[i];
playOrder.Add(song);
yield return song;
break;
}
}
// trim playOrder to prevent infinite memory here as well.
if (playOrder.Length > allSongs.Length * 10)
playOrder = playOrder.Skip(allSongs.Length * 8).ToList();
}
}
This would make songs picked in order, as long as they haven't been recently played. This provides "smoother" transitions from the end of one shuffle to the next, because the first song of the next shuffle could be the same song as the last shuffle with 1/(total songs) probability, whereas this algorithm has a lower (and configurable) chance of hearing one of the last x songs again.
Unless you shuffle the original song list (which you said you don't want to do), you are going to have to allocate some additional memory to accomplish what you're after.
If you generate the random permutation of song indices beforehand (as you are doing), you obviously have to allocate some non-trivial amount of memory to store it, either encoded or as a list.
If the user doesn't need to be able to see the list, you could generate the random song order on the fly: After each song, pick another random song from the pool of unplayed songs. You still have to keep track of which songs have already been played, but you can use a bitfield for that. If you have 10000 songs, you just need 10000 bits (1250 bytes), each one representing whether the song has been played yet.
I don't know your exact limitations, but I have to wonder if the memory required to store a playlist is significant compared to the amount required for playing audio.
There are a number of methods of generating permutations without needing to store the state. See this question.
I think you should stick to your current solution (the one in your edit).
To do a re-order with no repetitions & not making your code behave unreliable, you have to track what you have already used / like by keeping unused indexes or indirectly by swapping from the original list.
I suggest to check it in the context of the working application i.e. if its of any significance vs. the memory used by other pieces of the system.
From a logical standpoint, it is possible. Given a list of n songs, there are n! permutations; if you assign each permutation a number from 1 to n! (or 0 to n!-1 :-D) and pick one of those numbers at random, you can then store the number of the permutation that you are currently using, along with the original list and the index of the current song within the permutation.
For example, if you have a list of songs {1, 2, 3}, your permutations are:
0: {1, 2, 3}
1: {1, 3, 2}
2: {2, 1, 3}
3: {2, 3, 1}
4: {3, 1, 2}
5: {3, 2, 1}
So the only data I need to track is the original list ({1, 2, 3}), the current song index (e.g. 1) and the index of the permutation (e.g. 3). Then, if I want to find the next song to play, I know it's third (2, but zero-based) song of permutation 3, e.g. Song 1.
However, this method relies on you having an efficient means of determining the ith song of the jth permutation, which until I've had chance to think (or someone with a stronger mathematical background than I can interject) is equivalent to "then a miracle happens". But the principle is there.
If memory was really a concern after a certain number of records and it's safe to say that if that memory boundary is reached, there's enough items in the list to not matter if there are some repeats, just as long as the same song was not repeated twice, I would use a combination method.
Case 1: If count < max memory constraint, generate the playlist ahead of time and use Knuth shuffle (see Jon Skeet's implementation, mentioned in other answers).
Case 2: If count >= max memory constraint, the song to be played will be determined at run time (I'd do it as soon as the song starts playing so the next song is already generated by the time the current song ends). Save the last [max memory constraint, or some token value] number of songs played, generate a random number (R) between 1 and song count, and if R = one of X last songs played, generate a new R until it is not in the list. Play that song.
Your max memory constraints will always be upheld, although performance can suffer in case 2 if you've played a lot of songs/get repeat random numbers frequently by chance.
you could use a trick we do in sql server to order sets in random like this with the use of guid. the values are always distributed equaly random.
private IEnumerable<int> RandomIndexes(int startIndexInclusive, int endIndexInclusive)
{
if (endIndexInclusive < startIndexInclusive)
throw new Exception("endIndex must be equal or higher than startIndex");
List<int> originalList = new List<int>(endIndexInclusive - startIndexInclusive);
for (int i = startIndexInclusive; i <= endIndexInclusive; i++)
originalList.Add(i);
return from i in originalList
orderby Guid.NewGuid()
select i;
}
You're going to have to allocate some memory, but it doesn't have to be a lot. You can reduce the memory footprint (the degree by which I'm unsure, as I don't know that much about the guts of C#) by using a bool array instead of int. Best case scenario this will only use (count / 8) bytes of memory, which isn't too bad (but I doubt C# actually represents bools as single bits).
public static IEnumerable<int> RandomIndexes(int count) {
Random rand = new Random();
bool[] used = new bool[count];
int i;
for (int counter = 0; counter < count; counter++) {
while (used[i = rand.Next(count)]); //i = some random unused value
used[i] = true;
yield return i;
}
}
Hope that helps!
As many others have said you should implement THEN optimize, and only optimize the parts that need it (which you check on with a profiler). I offer a (hopefully) elegant method of getting the list you need, which doesn't really care so much about performance:
using System;
using System.Collections.Generic;
using System.Linq;
namespace Test
{
class Program
{
static void Main(string[] a)
{
Random random = new Random();
List<int> list1 = new List<int>(); //source list
List<int> list2 = new List<int>();
list2 = random.SequenceWhile((i) =>
{
if (list2.Contains(i))
{
return false;
}
list2.Add(i);
return true;
},
() => list2.Count == list1.Count,
list1.Count).ToList();
}
}
public static class RandomExtensions
{
public static IEnumerable<int> SequenceWhile(
this Random random,
Func<int, bool> shouldSkip,
Func<bool> continuationCondition,
int maxValue)
{
int current = random.Next(maxValue);
while (continuationCondition())
{
if (!shouldSkip(current))
{
yield return current;
}
current = random.Next(maxValue);
}
}
}
}
It is pretty much impossible to do it without allocating extra memory. If you're worried about the amount of extra memory allocated, you could always pick a random subset and shuffle between those. You'll get repeats before every song is played, but with a sufficiently large subset I'll warrant few people will notice.
const int MaxItemsToShuffle = 20;
public static IEnumerable<int> RandomIndexes(int count)
{
Random random = new Random();
int indexCount = Math.Min(count, MaxItemsToShuffle);
int[] indexes = new int[indexCount];
if (count > MaxItemsToShuffle)
{
int cur = 0, subsetCount = MaxItemsToShuffle;
for (int i = 0; i < count; i += 1)
{
if (random.NextDouble() <= ((float)subsetCount / (float)(count - i + 1)))
{
indexes[cur] = i;
cur += 1;
subsetCount -= 1;
}
}
}
else
{
for (int i = 0; i < count; i += 1)
{
indexes[i] = i;
}
}
for (int i = indexCount; i > 0; i -= 1)
{
int curIndex = random.Next(0, i);
yield return indexes[curIndex];
indexes[curIndex] = indexes[i - 1];
}
}

Faster way to do a List<T>.Contains()

I am trying to do what I think is a "de-intersect" (I'm not sure what the proper name is, but that's what Tim Sweeney of EpicGames called it in the old UnrealEd)
// foo and bar have some identical elements (given a case-insensitive match)
List‹string› foo = GetFoo();
List‹string› bar = GetBar();
// remove non matches
foo = foo.Where(x => bar.Contains(x, StringComparer.InvariantCultureIgnoreCase)).ToList();
bar = bar.Where(x => foo.Contains(x, StringComparer.InvariantCultureIgnoreCase)).ToList();
Then later on, I do another thing where I subtract the result from the original, to see which elements I removed. That's super-fast using .Except(), so no troubles there.
There must be a faster way to do this, because this one is pretty bad-performing with ~30,000 elements (of string) in either List. Preferably, a method to do this step and the one later on in one fell swoop would be nice. I tried using .Exists() instead of .Contains(), but it's slightly slower. I feel a bit thick, but I think it should be possible with some combination of .Except() and .Intersect() and/or .Union().
This operation can be called a symmetric difference.
You need a different data structure, like a hash table. Add the intersection of both sets to it, then difference the intersection from each set.
UPDATE:
I got a bit of time to try this in code. I used HashSet<T> with a set of 50,000 strings, from 2 to 10 characters long with the following results:
Original: 79499 ms
Hashset: 33 ms
BTW, there is a method on HashSet called SymmetricExceptWith which I thought would do the work for me, but it actually adds the different elements from both sets to the set the method is called on. Maybe this is what you want, rather than leaving the initial two sets unmodified, and the code would be more elegant.
Here is the code:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
class Program
{
static void Main(string[] args)
{
// foo and bar have some identical elements (given a case-insensitive match)
var foo = getRandomStrings();
var bar = getRandomStrings();
var timer = new Stopwatch();
timer.Start();
// remove non matches
var f = foo.Where(x => !bar.Contains(x)).ToList();
var b = bar.Where(x => !foo.Contains(x)).ToList();
timer.Stop();
Debug.WriteLine(String.Format("Original: {0} ms", timer.ElapsedMilliseconds));
timer.Reset();
timer.Start();
var intersect = new HashSet<String>(foo);
intersect.IntersectWith(bar);
var fSet = new HashSet<String>(foo);
var bSet = new HashSet<String>(bar);
fSet.ExceptWith(intersect);
bSet.ExceptWith(intersect);
timer.Stop();
var fCheck = new HashSet<String>(f);
var bCheck = new HashSet<String>(b);
Debug.WriteLine(String.Format("Hashset: {0} ms", timer.ElapsedMilliseconds));
Console.WriteLine("Sets equal? {0} {1}", fSet.SetEquals(fCheck), bSet.SetEquals(bCheck)); //bSet.SetEquals(set));
Console.ReadKey();
}
static Random _rnd = new Random();
private const int Count = 50000;
private static List<string> getRandomStrings()
{
var strings = new List<String>(Count);
var chars = new Char[10];
for (var i = 0; i < Count; i++)
{
var len = _rnd.Next(2, 10);
for (var j = 0; j < len; j++)
{
var c = (Char)_rnd.Next('a', 'z');
chars[j] = c;
}
strings.Add(new String(chars, 0, len));
}
return strings;
}
}
With intersect it would be done like this:
var matches = ((from f in foo
select f)
.Intersect(
from b in bar
select b, StringComparer.InvariantCultureIgnoreCase))
If the elements are unique within each list you should consider using an HashSet
The HashSet(T) class provides high
performance set operations. A set is a
collection that contains no duplicate
elements, and whose elements are in no
particular order.
With sorted list, you can use binary search.
Contains on a list is an O(N) operation. If you had a different data structure, such as a sorted list or a Dictionary, you would dramatically reduce your time. Accessing a key in a sorted list is usually O(log N) time, and in a hash is usually O(1) time.

Categories