I have a HashSet<string> with ~50k members. I have another list of objects that I'm iterating through one by one to determine if the object's email exists. If it does, I need to perform some action on the object.
var emailList = db.Emails.Select(s => s.EmailAddress.ToLower()).ToList();
var emailHash = new HashSet<string>(emailList);
var objects = db.Objects.ToList();
// everything is fine up to this point
foreach (var object in objects) {
if (!emailHash.Any(s => s.Equals(object.Email))) { // This takes ~0.3s
Console.WriteLine("Email: {0}", object.Email);
}
}
What can I do to speed up the evaluation of whether or not one string exists in a list of strings?
You are not using the HashSet correctly. Using Linq's .Any() will actually evaluate your condition against each element stored in the HashSet.
To search if an item exists in a HashSet (with constant time, O(1)) use emailHash.Contains(object.Email).
One obvious change is to not use the Enumerable.Any() LINQ function, which basically negates the advantages of using a hash set by performing a sequential search.
Instead, use HashSet's built-in Contains(string) function:
foreach (var object in objects) {
if (!emailHash.Contains(object.Email)) {
Console.WriteLine("Email: {0}", object.Email);
}
}
Related
I got a List<string> named Test:
List<string> Test = new List<string>();
I want to add a string to it using Test.Add();, but first I want to check if it already exists in the list.
I thought of something like this:
if (Test.Find("Teststring") != true)
{
Test.Add("Teststring");
}
However, this returns an error.
I assume that you if you don't want to add the item if it is already added
Try This:
if (!Test.Contains("Teststring"))
{
Test.Add("Teststring");
}
Any receives a Predicate. It determines if any element in a collection matches a certain condition. You could do this imperatively, using a loop construct. But the Any extension method provides another way.
See this:
bool b1 = Test.Any(item => item == "Teststring");
Also you can use :
if (!Test.Contains("Teststring"))
{
...
}
If you don't want to add an item twice it is a good indicator that you might use a HashSet<T> instead which is more efficient but doesn't allow duplicates(like a Dictionary with only keys).
HashSet<string> Test = new HashSet<string>();
bool newString = Test.Add("Teststring");
If you need to use the list use List.Contains to check if the string is already in the list.
What is the difference between HashSet and List in C#?
But your code suggests that you only want to add duplicates. I assume that this is not intended.
In my opinion you are using the wrong datastructure here. You should use Hashset to avoid duplicates.
The lookup time for Hashset is O(1) whereas for list it is O(n)
The HashSet class provides high-performance set operations. A set is a collection that contains no duplicate elements, and whose elements are in no particular order.
This is how your code should look like.
HashSet<string> Test = new HashSet<string>();
Test.Add("Teststring");
Use Test.Contains("TestString");
I am trying to remove object while I am iterating through Collection. But I am getting exception. How can I achieve this?
Here is my code :
foreach (var gem in gems)
{
gem.Value.Update(gameTime);
if (gem.Value.BoundingCircle.Intersects(Player.BoundingRectangle))
{
gems.Remove(gem.Key); // I can't do this here, then How can I do?
OnGemCollected(gem.Value, Player);
}
}
foreach is designed for iterating over a collection without modifing it.
To remove items from a collection while iterating over it use a for loop from the end to the start of it.
for(int i = gems.Count - 1; i >=0 ; i--)
{
gems[i].Value.Update(gameTime);
if (gems[i].Value.BoundingCircle.Intersects(Player.BoundingRectangle))
{
Gem gem = gems[i];
gems.RemoveAt(i); // Assuming it's a List<Gem>
OnGemCollected(gem.Value, Player);
}
}
If it's a dictionary<string, Gem> for example, you could iterate like this:
foreach(string s in gems.Keys.ToList())
{
if(gems[s].BoundingCircle.Intersects(Player.BoundingRectangle))
{
gems.Remove(s);
}
}
The easiest way is to do what #IV4 suggested:
foreach (var gem in gems.ToList())
The ToList() will convert the Dictionary to a list of KeyValuePair, so it will work fine.
The only time you wouldn't want to do it that way is if you have a big dictionary from which you are only removing relatively few items and you want to reduce memory use.
Only in that case would you want to use one of the following approaches:
Make a list of the keys as you find them, then have a separate loop to remove the items:
List<KeyType> keysToRemove = new List<KeyType>();
foreach (var gem in gems)
{
gem.Value.Update(gameTime);
if (gem.Value.BoundingCircle.Intersects(Player.BoundingRectangle))
{
OnGemCollected(gem.Value, Player);
keysToRemove.Add(gem.Key);
}
}
foreach (var key in keysToRemove)
gems.Remove(key);
(Where KeyType is the type of key you're using. Substitute the correct type!)
Alternatively, if it is important that the gem is removed before calling OnGemCollected(), then (with key type TKey and value type TValue) do it like this:
var itemsToRemove = new List<KeyValuePair<TKey, TValue>>();
foreach (var gem in gems)
{
gem.Value.Update(gameTime);
if (gem.Value.BoundingCircle.Intersects(Player.BoundingRectangle))
itemsToRemove.Add(gem);
}
foreach (var item in itemsToRemove)
{
gems.Remove(item.Key);
OnGemCollected(item.Value, Player);
}
As the other answers say, a foreach is designed purely for iterating over a collection without modifying it as per the documenation:
The foreach statement is used to iterate through the collection to get
the desired information, but should not be used to change the contents
of the collection to avoid unpredictable side effects.
in order to do this you would need to use a for loop (storing the items of the collection you need to remove) and remove them from the collection afterwards.
However if you are using a List<T> you could do this:
lines.RemoveAll(line => line.FullfilsCertainConditions());
After going through all the answers, and being equally good. I faced a challenge where I had to modify a List and what I ended up doing worked quite well for me. So just in case anyone finds it useful. Can someone provide me feedback on how efficient it might be.
Action removeFromList;
foreach(var value in listOfValues){
if(whatever condition to remove is){
removeFromList+=()=>listOfValues.remove(value);
}
}
removeFromList?.Invoke();
removeFromList = null;
You should use the for loop instead of the foreach loop. Please refer here
Collections support foreach statement using Enumarator. Enumerators can be used to read the data in the collection, but they cannot be used to modify the underlying collection. If changes are made to the collection, such as adding, modifying, or deleting elements, the enumerator is irrecoverably invalidated and the next call to MoveNext or Reset throws an InvalidOperationException.
Use for loop for collection modifying.
i am using a HashSet in order to avoid having two (or more)items with the same value inside my collection , on my work i need to iterate over my hashset and remove its values but unfortunatly i cant do so , what i am trying to do is:
string newValue = "";
HashSet<string> myHashSet;
myHashSet = GetAllValues(); // lets say there is a function which fill the hashset
foreach (string s in myHashSet)
{
newValue = func(s) // lets say that func on some cases returns s as it was and
if(s != newValue) // for some cases returns another va
{
myHashSet.Remove(s);
myHashSet.Add(newValue);
}
}
thanks in advance for your kind help
You cannot modify the container while it's being iterated. The solution would be to project the initial set into a "modified" set using LINQ (Enumerable.Select), and create a new HashSet from the results of the projection.
Since if there is a func with the appropriate signature you can directly stick it into the Enumerable.Select method, and since HashSet has a constructor that accepts an IEnumerable<T>, it all comes down to one line:
var modifiedHashSet = new HashSet(myHashSet.Select(func));
The accepted answer is indeed correct, but if, as in my case, you require modification of the same instance, you can iterate through a copy of the HashSet.
foreach (string s in myHashSet.ToArray()) // ToArray will create a copy
{
newValue = func(s)
if(s != newValue)
{
myHashSet.Remove(s);
myHashSet.Add(newValue);
}
}
after reading this very interesting thread on duplicate removal, i ended with this =>
public static IEnumerable<T> deDuplicateCollection<T>(IEnumerable<T> input)
{
var hs = new HashSet<T>();
foreach (T t in input)
if (hs.Add(t))
yield return t;
}
by the way, as i'm brand new to C# and coming from Python, i'm a bit lost between casting and this kind of thing... i was able to compile and build with :
foreach (KeyValuePair<long, List<string>> kvp in d)
{
d[kvp.Key] = (List<string>) deDuplicateCollection(kvp.Value);
}
but i must have missed something here... as i get a "System.InvalidCastException" # runtime, maybe could you point interesting things about casting and where i'm wrong? Thank you in advance.
First, about the usage of the method.
Drop the cast, invoke ToList() on the result of the method. The result of the method is IEnumerable<string>, this is not a List<string>. The fact the source is originally a List<string> is irrelevant, you don't return the list, you yield return a sequence.
d[kvp.Key] = deDuplicateCollection(kvp.Value).ToList();
Second, your deDuplicateCollection method is redundant, Distinct() already exists in the library and performs the same function.
d[kvp.Key] = kvp.Value.Distinct().ToList();
Just be sure you have a using System.Linq; in the directives so you can use these Distinct() and ToList() extension methods.
Finally, you'll notice making this change alone, you run into a new exception when trying to change the dictionary in the loop. You cannot update the collection in a foreach. The simplest way to do what you want is to omit the explicit loop entirely. Consider
d = d.ToDictionary(kvp => kvp.Key, kvp => kvp.Value.Distinct().ToList());
This uses another Linq extension method, ToDictionary(). Note: this creates a new dictionary in memory and updates d to reference it. If you need to preserve the original dictionary as referenced by d, then you would need to approach this another way. A simple option here is to build a dictionary to shadow d, and then update d with it.
var shadow = new Dictionary<string, string>();
foreach (var kvp in d)
{
shadow[kvp.Key] = kvp.Value.Distinct().ToList();
}
foreach (var kvp in shadow)
{
d[kvp.Key] = kvp.Value;
}
These two loops are safe, but you see you need to loop twice to avoid the problem of updating the original collection while enumerating over it while also preserving the original collection in memory.
d[kvp.Key] = kvp.Value.Distinct().ToList();
There is already a Distinct extension method to remove duplicates!
For example
var query = myDic.Where(x => !blacklist.Contains(x.Key));
foreach (var item in query)
{
if (condition)
blacklist.Add(item.key+1); //key is int type
ret.add(item);
}
return ret;
would this code be valid? and how do I improve it?
Updated
i am expecting my blacklist.add(item.key+1) would result in smaller ret then otherwise. The ToList() approach won't achieve my intention in this sense.
is there any other better ideas, correct and unambiguous.
That is perfectly safe to do and there shouldn't be any problems as you're not directly modifying the collection that you are iterating over. Though you are making other changes that affects where clause, it's not going to blow up on you.
The query (as written) is lazily evaluated so blacklist is updated as you iterate through the collection and all following iterations will see any newly added items in the list as it is iterated.
The above code is effectively the same as this:
foreach (var item in myDic)
{
if (!blacklist.Contains(item.Key))
{
if (condition)
blacklist.Add(item.key + 1);
}
}
So what you should get out of this is that as long as you are not directly modifying the collection that you are iterating over (the item after in in the foreach loop), what you are doing is safe.
If you're still not convinced, consider this and what would be written out to the console:
var blacklist = new HashSet<int>(Enumerable.Range(3, 100));
var query = Enumerable.Range(2, 98).Where(i => !blacklist.Contains(i));
foreach (var item in query)
{
Console.WriteLine(item);
if ((item % 2) == 0)
{
var value = 2 * item;
blacklist.Remove(value);
}
}
Yes. Changing a collections internal objects is strictly prohibited when iterating over a collection.
UPDATE
I initially made this a comment, but here is a further bit of information:
I should note that my knowledge comes from experience and articles I've read a long time ago. There is a chance that you can execute the code above because (I believe) the query contains references to the selected object within blacklist. blacklist might be able to change, but not query. If you were strictly iterating over blacklist, you would not be able to add to the blacklist collection.
Your code as presented would not throw an exception. The collection being iterated (myDic) is not the collection being modified (blacklist or ret).
What will happen is that each iteration of the loop will evaluate the current item against the query predicate, which would inspect the blacklist collection to see if it contains the current item's key. This is lazily evaluated, so a change to blacklist in one iteration will potentially impact subsequent iterations, but it will not be an error. (blacklist is fully evaluated upon each iteration, its enumerator is not being held.)