after reading this very interesting thread on duplicate removal, i ended with this =>
public static IEnumerable<T> deDuplicateCollection<T>(IEnumerable<T> input)
{
var hs = new HashSet<T>();
foreach (T t in input)
if (hs.Add(t))
yield return t;
}
by the way, as i'm brand new to C# and coming from Python, i'm a bit lost between casting and this kind of thing... i was able to compile and build with :
foreach (KeyValuePair<long, List<string>> kvp in d)
{
d[kvp.Key] = (List<string>) deDuplicateCollection(kvp.Value);
}
but i must have missed something here... as i get a "System.InvalidCastException" # runtime, maybe could you point interesting things about casting and where i'm wrong? Thank you in advance.
First, about the usage of the method.
Drop the cast, invoke ToList() on the result of the method. The result of the method is IEnumerable<string>, this is not a List<string>. The fact the source is originally a List<string> is irrelevant, you don't return the list, you yield return a sequence.
d[kvp.Key] = deDuplicateCollection(kvp.Value).ToList();
Second, your deDuplicateCollection method is redundant, Distinct() already exists in the library and performs the same function.
d[kvp.Key] = kvp.Value.Distinct().ToList();
Just be sure you have a using System.Linq; in the directives so you can use these Distinct() and ToList() extension methods.
Finally, you'll notice making this change alone, you run into a new exception when trying to change the dictionary in the loop. You cannot update the collection in a foreach. The simplest way to do what you want is to omit the explicit loop entirely. Consider
d = d.ToDictionary(kvp => kvp.Key, kvp => kvp.Value.Distinct().ToList());
This uses another Linq extension method, ToDictionary(). Note: this creates a new dictionary in memory and updates d to reference it. If you need to preserve the original dictionary as referenced by d, then you would need to approach this another way. A simple option here is to build a dictionary to shadow d, and then update d with it.
var shadow = new Dictionary<string, string>();
foreach (var kvp in d)
{
shadow[kvp.Key] = kvp.Value.Distinct().ToList();
}
foreach (var kvp in shadow)
{
d[kvp.Key] = kvp.Value;
}
These two loops are safe, but you see you need to loop twice to avoid the problem of updating the original collection while enumerating over it while also preserving the original collection in memory.
d[kvp.Key] = kvp.Value.Distinct().ToList();
There is already a Distinct extension method to remove duplicates!
Related
I am coding a C# forms application, and would like to know if the following two functions achieve the same result:
public List<object> Method1(int parentId)
{
List<object> allChildren = new List<object>();
foreach (var item in list.Where(c => c.parentHtmlNodeForeignKey == parentId))
{
allChildren.Add(item);
allChildren.AddRange(Method1(item.id));
}
return allChildren;
}
public IEnumerable<object> Method2(int parentId)
{
foreach (var item in list.Where(c => c.parentHtmlNodeForeignKey == parentId))
{
yield return item;
foreach (var itemy in Method2(item.id))
{
yield return itemy;
}
}
}
Am I correct in saying that the Method1 function is more efficient than the Method2?
Also, can either of the above functions be coded to be more efficient?
EDIT
I am using the function to return some objects that are then displayed in a ListView. I am then looping through these same objects to check if a string occurs.
Thanks.
This highly depends on what you want to do. For example if you use FirstOrDefault(p => ....) the yield method can be faster because it's not required to store all the stuff into a list and if the first element is the right one the list method has some overhead ( Of course the yield method has also overhead but as i said it depends ).
If you want to iterate over and over again over the data then you should go with the list.
It depends on lot's of things.
Here are some reasons to use IEnumerable<T> over List<T>:
When you are iterating a part of a collection (e.g. using FirstOrDefault, Any, Take etc.).
When you have an large collection and you can ToList() it (e.g. Fibonacci Series).
When you shouldn't use IEnumerable<T> over List<T>:
When you are enumerating a DB query multiple times with different conditions (You may want the results in memory).
When you want to iterate the whole collection more than once - There is no need to create iterators each time.
Currently coding in C#, I wonder if there is a way to factor the code as presented below
Entity1 = GetByName("EntityName1");
Entity2 = GetByName("EntityName2");
Entity3 = GetByName("EntityName3");
The idea would be to get a single call in the code, factoring the code by placing the entities and the strings in a container and iterating on this container to get a single "GetByName()" line. Is there a way to do this?
You can use LINQ:
var names=new[]{"EntityName1","EntityName2","EntityName3",.....};
var entities=names.Select(name=>GetByName(name)).ToArray();
Without ToArray, Select will return an IEnumerable that will be reevalueated each time you enumerate it - that is, GetByName will be called each time you enumerate the enumerable.
ToArray() or ToList will create an array (or list) you can use multiple times.
You can also call ToDictionary if you want to be able to access the entities by name:
var entities=names.ToDictionary(name=>name,name=>GetByName(name));
All this assumes that the entities don't already exist or that GetByName has to do some significant work to retrieve them. If the entities exist you can simply put them in a Dictionary<String,Entity>. If the entities have a Name property you can use ToDictionary to create the dictionary in one statement:
var entities=new []{entity1,entity2,entity3,...};
var entitiesDict=entities.ToDictionary(entity=>entity.Name,entity=>entity);
Do you mean something like the below (where entities is the collection of Entity1, Entity1 & Entity3):
var results = entities.Select(e => GetByName(e.Name));
It depends on what you're looking for. If you need to set the variables in a single line, that won't work. You could play with reflection if you're dealing with fields or properties, but honestly that seems messier than what you've got already.
If the data-structure doesn't matter, and you just need the data and are able to play with it as you see so fit, I'd probably enumerate it into a dictionary. Of course, that's pretty tightly coupled to what you've got now, which looks like it's a fake implementation anyway.
If you want to do that, it's pretty straight-forward. It's your choice how you create the IEnumerable<string> that's represented below as entityNames. You could use an array initializer as I do, you could use a List<string> that you build over time, you could even yield return it in its own method.
var entityNames = new[] { "EntityName1", "EntityName2", "EntityName3" };
var dict = entityNames.ToDictionary(c => c, c => GetByName(c));
Then it's just a matter of checking those.
var entity1 = dict["EntityName1"];
Or enumerating over the dictionary.
foreach(var kvp in dict)
{
Console.WriteLine("{0} - {1}", kvp.Key, kvp.Value);
}
But realistically, it's hard to know whether that's preferable to what you've already got.
Ok, here is an idea.
You can declare this function.
IReadOnlyDictionary<string, T> InstantiateByName<T>(
Func<string, T> instantiator
params string[] names)
{
return names.Distinct().ToDictionary(
name => name,
name => instantiator(name))
}
which you could call like this,
var entities = InstantiateByName(
GetByName,
"EntityName1",
"EntityName2",
"EntityName3");
To push the over-engineering to the next level,
you can install the Immutable Collections package,
PM> Install-Package Microsoft.Bcl.Immutable
and modify the function slightly,
using Microsoft.Immutable.Collections;
IReadOnlyDictionary<string, T> InstantiateByName<T>(
Func<string, T> instantiator
params string[] names,
IEqualityComparer<string> keyComparer = null,
IEqualityComparer<T> valueComparer = null)
{
if (keyComparer == null)
{
keyComparer = EqualityComparer<string>.Default;
}
if (valueComparer == null)
{
valueComparer = EqualityComparer<T>.Default;
}
return names.ToImmutableDictionary(
name => name,
name => instantiator(name),
keyComparer,
valueComparer);
}
The function would be used in the exactly the same way. However, the caller is responsible for passing unique keys to the function but, an alternative equality comparer can be passed.
I am trying to remove object while I am iterating through Collection. But I am getting exception. How can I achieve this?
Here is my code :
foreach (var gem in gems)
{
gem.Value.Update(gameTime);
if (gem.Value.BoundingCircle.Intersects(Player.BoundingRectangle))
{
gems.Remove(gem.Key); // I can't do this here, then How can I do?
OnGemCollected(gem.Value, Player);
}
}
foreach is designed for iterating over a collection without modifing it.
To remove items from a collection while iterating over it use a for loop from the end to the start of it.
for(int i = gems.Count - 1; i >=0 ; i--)
{
gems[i].Value.Update(gameTime);
if (gems[i].Value.BoundingCircle.Intersects(Player.BoundingRectangle))
{
Gem gem = gems[i];
gems.RemoveAt(i); // Assuming it's a List<Gem>
OnGemCollected(gem.Value, Player);
}
}
If it's a dictionary<string, Gem> for example, you could iterate like this:
foreach(string s in gems.Keys.ToList())
{
if(gems[s].BoundingCircle.Intersects(Player.BoundingRectangle))
{
gems.Remove(s);
}
}
The easiest way is to do what #IV4 suggested:
foreach (var gem in gems.ToList())
The ToList() will convert the Dictionary to a list of KeyValuePair, so it will work fine.
The only time you wouldn't want to do it that way is if you have a big dictionary from which you are only removing relatively few items and you want to reduce memory use.
Only in that case would you want to use one of the following approaches:
Make a list of the keys as you find them, then have a separate loop to remove the items:
List<KeyType> keysToRemove = new List<KeyType>();
foreach (var gem in gems)
{
gem.Value.Update(gameTime);
if (gem.Value.BoundingCircle.Intersects(Player.BoundingRectangle))
{
OnGemCollected(gem.Value, Player);
keysToRemove.Add(gem.Key);
}
}
foreach (var key in keysToRemove)
gems.Remove(key);
(Where KeyType is the type of key you're using. Substitute the correct type!)
Alternatively, if it is important that the gem is removed before calling OnGemCollected(), then (with key type TKey and value type TValue) do it like this:
var itemsToRemove = new List<KeyValuePair<TKey, TValue>>();
foreach (var gem in gems)
{
gem.Value.Update(gameTime);
if (gem.Value.BoundingCircle.Intersects(Player.BoundingRectangle))
itemsToRemove.Add(gem);
}
foreach (var item in itemsToRemove)
{
gems.Remove(item.Key);
OnGemCollected(item.Value, Player);
}
As the other answers say, a foreach is designed purely for iterating over a collection without modifying it as per the documenation:
The foreach statement is used to iterate through the collection to get
the desired information, but should not be used to change the contents
of the collection to avoid unpredictable side effects.
in order to do this you would need to use a for loop (storing the items of the collection you need to remove) and remove them from the collection afterwards.
However if you are using a List<T> you could do this:
lines.RemoveAll(line => line.FullfilsCertainConditions());
After going through all the answers, and being equally good. I faced a challenge where I had to modify a List and what I ended up doing worked quite well for me. So just in case anyone finds it useful. Can someone provide me feedback on how efficient it might be.
Action removeFromList;
foreach(var value in listOfValues){
if(whatever condition to remove is){
removeFromList+=()=>listOfValues.remove(value);
}
}
removeFromList?.Invoke();
removeFromList = null;
You should use the for loop instead of the foreach loop. Please refer here
Collections support foreach statement using Enumarator. Enumerators can be used to read the data in the collection, but they cannot be used to modify the underlying collection. If changes are made to the collection, such as adding, modifying, or deleting elements, the enumerator is irrecoverably invalidated and the next call to MoveNext or Reset throws an InvalidOperationException.
Use for loop for collection modifying.
I am curious as to what restrictions necessitated the design decision to not have HashSet's be able to use LINQ's ForEach query.
What's really going on differently behind the scenes for these two implementations:
var myHashSet = new HashSet<T>;
foreach( var item in myHashSet ) { do.Stuff(); }
vs
var myHashSet = new HashSet<T>;
myHashSet.ForEach( item => do.Stuff(); }
I'm (pretty) sure that this is just because HashSet does not implement IEnumerable -- but what is a normal ForEach loop doing differently that makes it more supported by a HashSet?
Thanks
LINQ doesn't have ForEach. Only the List<T> class has a ForEach method.
It's also important to note that HashSet does implement IEnumerable<T>.
Remember, LINQ stands for Language INtegrated Query. It is meant to query collections of data. ForEach has nothing to do with querying. It simply loops over the data. Therefore it really doesn't belong in LINQ.
LINQ is meant to query data, I'm guessing it avoided ForEach() because there's a chance it could mutate data that would affect the way the data could be queried (i.e. if you changed a field that affected the hash code or equality).
You may be confused with the fact that List<T> has a ForEach()?
It's easy enough to write one, of course, but it should be used with caution because of those aforementioned concerns...
public static class EnumerableExtensions
{
public static void ForEach<T>(this IEnumerable<T> source, Action<T> action)
{
if (source == null) throw new ArgumentNullException("source");
if (action == null) throw new ArgumentNullException("action");
foreach(var item in source)
{
action(item);
}
}
}
var myHashSet = new HashSet<T>;
myHashSet.ToList().ForEach( x => x.Stuff() );
The first use the method GetEnumerator of HashSet
The second the method ForEach
Maybe the second use GetEnumerator behind the scene but I'm not sure.
This is a long shot, I know...
Let's say I have a collection
List<MyClass> objects;
and I want to run the same method on every object in the collection, with or without a return value. Before Linq I would have said:
List<ReturnType> results = new List<ReturnType>();
List<int> FormulaResults = new List<int>();
foreach (MyClass obj in objects) {
results.Add(obj.MyMethod());
FormulaResults.Add(ApplyFormula(obj));
}
I would love to be able to do something like this:
List<ReturnType> results = new List<ReturnType>();
results.AddRange(objects.Execute(obj => obj.MyMethod()));
// obviously .Execute() above is fictitious
List<int> FormulaResults = new List<int>();
FormulaResults.AddRange(objects.Execute(obj => ApplyFormula(obj)));
I haven't found anything that will do this. Is there such a thing?
If there's nothing generic like I've posited above, at least maybe there's a way of doing it for the purposes I'm working on now: I have a collection of one object that has a wrapper class:
class WrapperClass {
private WrappedClass wrapped;
public WrapperClass(WrappedClass wc) {
this.wrapped = wc;
}
}
My code has a collection List<WrappedClass> objects and I want to convert that to a List<WrapperClass>. Is there some clever Linq way of doing this, without doing the tedious
List<WrapperClass> result = new List<WrapperClass>();
foreach (WrappedClass obj in objects)
results.Add(new WrapperClass(obj));
Thanks...
Would:
results.AddRange(objects.Select(obj => ApplyFormula(obj)));
do?
or (simpler)
var results = objects.Select(obj => ApplyFormula(obj)).ToList();
I think that the Select() extension method can do what you're looking for:
objects.Select( obj => obj.MyMethod() ).ToList(); // produces List<Result>
objects.Select( obj => ApplyFormula(obj) ).ToList(); // produces List<int>
Same thing for the last case:
objects.Select( obj => new WrapperClass( obj ) ).ToList();
If you have a void method which you want to call, here's a trick you can use with IEnumerable, which doesn't have a ForEach() extension, to create a similar behavior without a lot of effort.
objects.Select( obj => { obj.SomeVoidMethod(); false; } ).Count();
The Select() will produce a sequence of [false] values after invoking SomeVoidMethod() on each [obj] in the objects sequence. Since Select() uses deferred execution, we call the Count() extension to force each element in the sequence to be evaluated. It works quite well when you want something like a ForEach() behavior.
If the method MyMethod that you want to apply returns an object of type T then you can obtain an IEnumerable<T> of the result of the method via:
var results = objects.Select(o => o.MyMethod());
If the method MyMethod that you want to apply has return type void then you can apply the method via:
objects.ForEach(o => o.MyMethod());
This assumes that objects is of generic type List<>. If all you have is an IEnumerable<> then you can roll your own ForEach extension method or apply objects.ToList() first and use the above syntax .
The C# compiler maps a LINQ select onto the .Select extension method, defined over IEnumerable (or IQueryable which we'll ignore here). Actually, that .Select method is exactly the kind of projection function that you're after.
LBushkin is correct, but you can actually use LINQ syntax as well...
var query = from o in objects
select o.MyMethod();
You can also run a custom method using the marvelous Jon Skeet's morelinq library
For example if you had a text property on your MyClass that you needed to change in runtime using a method on the same class:
objects = objects.Pipe<MyClass>(class => class.Text = class.UpdateText()).ToList();
This method will now be implemented on every object in your list. I love morelinq!
http://www.hookedonlinq.com/UpdateOperator.ashx has an extended Update method you can use. Or you can use a select statement as posted by others.