Removing from a collection while in a foreach with linq - c#

From what I understand, this seems to not be a safe practice...
I have a foreach loop on my list object that I am stepping through. Inside that foreach loop I am looking up records by an Id. Once I have that new list of records returned by that Id I do some parsing and add them to a new list.
What I would like to do is not step through the same Id more than once. So my thought process would be to remove it from the original list. However, this causes an error... and I understand why.
My question is... Is there a safe way to go about this? or should I restructure my thought process a bit? I was wondering if anyone had any experience or thoughts on how to solve this issue?
Here is a little pseudocode:
_myList.ForEach(x =>
{
List<MyModel> newMyList = _myList.FindAll(y => y.SomeId == x.SomeId).ToList();
//Here is where I would do some work with newMyList
//Now I am done... time to remove all records with x.SomeId
_myList.RemoveAll(y => y.SomeId == x.SomeId);
});
I know that _myList.RemoveAll(y => y.SomeId == x.SomeId); is wrong, but in theory that would kinda be what I would be looking for.
I have also toyed around with the idea of pushing the used SomeId to an idList and then have it check each time, but that seems cumbersome and was wondering if there was a nicer way to handle what I am looking to do.
Sorry if i didnt explain this that well. If there are any questions, please feel free to comment and I will answer/make edits where needed.

First off, using ForEach in your example isn't a great idea for these reasons.
You're right to think there are performance downsides to iterating through the full list for each remaining SomeId, but even making the list smaller every time would still require another full iteration of that subset (if it even worked).
As was pointed out in the comments, GroupBy on SomeId organizes the elements into groupings for you, and allows you to efficiently step through each subset for a given SomeId, like so:
_myList.GroupBy(x => x.SomeId)
.Select(g => DoSomethingWithGroupedElements(g));
Jon Skeet has an excellent set of articles about how the Linq extensions could be implemented. I highly recommend checking it out for a better understanding of why this would be more efficient.

First of all, a list inside a foreach is immutable, you can't add or delete content, nor rewrite an element. There are a few ways you could handle this situation:
GroupBy
This is the method I would use. You can group your list by the property you want, and iterate through the IGrouping formed this way
var groups = list.GroupBy(x => x.yourProperty);
foreach(var group in groups)
{
//your code
}
Distinct properties list
You could also save properties in another list, and cycle through that list instead of the original one
var propsList = list.Select(x=>x.yourProperty).Distinct();
foreach(var prop in propsList)
{
var tmpList = list.Where(x=>x.yourProperty == prop);
//your code
}
While loop
This will actually do what you originally wanted, but performances may not be optimal
while(list.Any())
{
var prop = list.First().yourProperty;
var tmpList = list.Where(x=>x.yourProperty == prop);
//your code
list.RemoveAll(x=>x.yourProperty == prop);
}

Related

C# Update or insert from a List by some criteria

I have 2 lists, both with a different number of items, both with 1 parameter in common that I have to compare. If the value of the parameter is the same I have to update the DB, but if the item in a list doesn't have an item in the second list, I have to insert it into the DB.
This is what I was trying:
foreach (var rep in prodrep)
{
foreach (var crm in prodcrm)
{
if (rep.VEHI_SERIE.Equals(crm.VEHI_SERIE))
{
updateRecord(rep.Data);
}
else
{
insertRecords(rep.Data);
}
}
}
The first problem with this is that it is very slow. The second problem is that obviously the insert statement would't work, but I don't want to do another for each inside a foreach to verify if it doesn't exist, because that would take double the time.
How can I make this more efficient?
This is now not as efficient but this should work.
var existing = prodrep.Where(rep => prodcrm.Any(crm => rep.VEHI_SERIE.Equals(crm.VEHI_SERIE)).Select(rep=> Rep = rep, Crm=prodcrm.FirstOrDefault(crm=>rep.VEHI_SERIE.Equals(crm.VEHI_SERIE));
existing.ForEach(mix=>updateRecord(mix.Rep.Data, mix.Crm.Id));
prodrep.Where(rep => !existing.Any(mix=>mix.Rep==rep)).ForEach(rep=>insertRecords(rep.Data));
var comparators = prodcrm.Select(i => i.VEHI_SERIE).ToList();
foreach (var rep in prodrep)
{
if (comparators.Contains(rep.VEHI_SERIE)
// do something
else
// do something else
}
see Algorithm to optimize nested loops
it's an interesting read, and a cool trick, however not necessarily something that you should apply in every situation.
Also, be careful about answers providing you with LINQ queries. often it "looks cool" because you're not using the word "for", but really it's just hiding those for loops under the hood.
If you're really concerned about performance and the computer can handle it, you can look at the Task Parallel Library. it's not necessarily going to solve all of your problems, because you can be limited by processor/memory and you could end up making your application slower.
Is this something that a user of your application is going to be regularly doing? If so, is it something you can you make it an asynchronous task that they can come back to later, or is it an offline process that they aren't ever going to see. Depending on usage expectations, sometimes the time something takes isn't the end of the world.

Searching a collection effectively in c#

I have an AsyncObservable collection of some class, say "dashboard". Each item inside dashboard collection contains a collection of some other class, say "chart". That chart has various properties such as name,type etc... I want to search based on chart name, type etc on this collection. Can anybody suggest me some searching technique? Currently I am searching by traversing the whole collection using a foreach and comparing entered input with each item inside the collection (this is not so efficient if amount of data is large)... I want to make it more efficient - I am using c#..
My code is:
foreach (DashBoard item in this.DashBoards)
{
Chart obj1 = item.CurrentCharts.ToList().Find(chart => chart.ChartName.ToUpper().Contains(searchText.ToUpper()));
if (obj1 != null)
{
if (obj1.IsHighlighted != Colors.Wheat)
obj1.IsHighlighted = Colors.Wheat;
item.IsExpanded = true;
flagList.Add(1);
}
else
{
flagList.Add(0);
}
}
You can use the LINQ query.
For example something you can do like this.If you post your code,we can solve the problem
Dashboard.SelectMany(q => q.Chart).Where(a => a.Name == "SomeName")
Here is the reference linq question: querying nested collections
Edit:Foreach loops or LINQ
The answer is not really clear-cut.There are two sides to any code cost arguments: performance and maintainability.The first of these is obvious and quantifiable.
Under the hood LINQ will iterate over the collection, just as foreach will. The difference between LINQ and foreach is that LINQ will defer execution until the iteration begins.
Performance wise take a look at this blog post: http://www.schnieds.com/2009/03/linq-vs-foreach-vs-for-loop-performance.html
In your case:
If the collection is relatively small or medium size i would suggest you to use foreach for better performance.
At the end of the day.
Linq is more elegant but less efficient most of the time, foreach clutters the code a bit but perform better.
On large collections/on a where using parallel computing make sense i would choose LINQ as the performance gaps will be reduced to minimum.

Efficiency of C# Find on 1000+ records

I am trying to essentially see if entities exist in a local context and sort them accordingly. This function seems to be faster than others we have tried runs in about 50 seconds for 1000 items but I am wondering if there is something I can do to improve the efficiency. I believe the find here is slowing it down significantly as a simple foreach iteration over 1000 takes milliseconds and benchmarking shows bottle necking there. Any ideas would be helpful. Thank you.
Sample code:
foreach(var entity in entities) {
var localItem = db.Set<T>().Find(Key);
if(localItem != null)
{
list1.Add(entity);
}
else
{
list2.Add(entity);
}
}
If this is a database (which from the comments I've gathered that it is...)
You would be better off doing fewer queries.
list1.AddRange(db.Set<T>().Where(x => x.Key == Key));
list2.AddRange(db.Set<T>().Where(x => x.Key != Key));
This would be 2 queries instead of 1000+.
Also be aware of the fact that by adding each one to a List<T>, you're keeping 2 large arrays. So if 1000+ turns into 10000000, you're going to have interesting memory issues.
See this post on my blog for more information: http://www.artisansoftware.blogspot.com/2014/01/synopsis-creating-large-collection-by.html
If I understand correctly the database seems to be the bottleneck? If you want to (effectivly) select data from a database relation, whose attribute x should match a ==-criteria, you should consider creating a secondary access path for that attribute (an index structure). Depending on your database system and the distribution in your table this might be a hash index (especially good for checks on ==) or a B+-tree (allrounder) or whatever your system offers you.
However this only works if...
you not only get the full data set once and have to live with that in your application.
adding (another) index to the relation is not out of question (or e.g. its not worth to have it for a single need).
adding an index wouldn't be effective - e.g if the attribute you are querying on has very few unique values.
I found your answers very helpful but here is ultimately how I fold the problem. It seemed .Find was the bottleneck.
var tableDictionary = db.Set<T>().ToDictionary(x => x.KeyValue, x => x);
foreach(var entity in entities) {
if (tableDictionary.ContainsKey(entity.yKeyValue))
{
list1.Add(entity);
}
else
{
list2.Add(entity);
}
}
This ran in with 900+ rows in about a 10th of a second which for our purposes was efficient enough.
Rather than querying the DB for each item, you can just do one query, get all of the data (since you want all of the data from the DB eventually) and you can then group it in memory, which can be done (in this case) about as efficiently as in the database. By creating a lookup of whether or not the key is equal, we can easily get the two groups:
var lookup = db.Set<T>().ToLookup(item => item.Key == Key);
var list1 = lookup[true].ToList();
var list2 = lookup[false].ToList();
(You can use AddRange instead if the lists have previous values that should also be in them.)

Projection: filling 2 arrays at once

I thought I would be clever and write something like this code sample. It also seemed like a clean and efficient way to fill an array without enumerating a second time.
int i = 0;
var tickers = new List<string>();
var resultTable = results.Select(result => new Company
{
Ticker = tickers[i++] = result.CompanyTicker,
});
I don't really care for an alternative way to do this, because I can obviously accomplish this easily with a for loop. I'm more interested why this snippet doesn't work ie, tickers.Count = 0 after the code runs, despite there being 100+ results. Can anyone tell me why I'm getting this unexpected behavior?
You need to iterate your query, for example use .ToArray() or ToList() at the end. Currently you just created a query, it hasn't been executed yet.
You may see: LINQ and Deferred Execution
Plus, I believe your code should throw an exception, for IndexOutOfRange, since your List doesn't have any items.
This is due to LINQ's lazy execution. When the query gets executed (i.e. when you iterate over it), the list should have your results. An easy way to do this is to use ToArrayorToList.
Linq should ideally not have side affects.
I don't see what would prevent this from being a two step process:
var tickers = results.Select(r => r.CompanyTicker).ToList();
var resultTable = tickers.Select(t => new Company { Ticker = t }).ToList();

Simple C# foreach to LINQ question

Currently have this:
foreach (var series in Chart1.Series)
{
series.Enabled = false;
}
I would like to express this in a simple, one line expression. I thought this would work:
Chart1.Series.Select( series => series.Enabled = false);
This doesn't have any effect, however. Presumably because I just misunderstood how Select was working, which is fine.
My next thought was to do something like Chart1.Series.ForEach( series => series.Enabled = false), but Chart1.Series does not implement IEnumberable (..or at least ForEach is not an acceptable method to call).
I'd rather not do Chart1.Series = Chart1.Series.ToList().ForEach( series => series.Enabled = false);, but maybe that is the simplest option?
The foreach is preferred for what you're trying to do. You're iterating over a sequence of elements and modifying the elements. That's what foreach is for.
Linq is used to take one sequence of elements and generate a new sequence based on some criteria/transformation. Not what you're after.
Simple and one line :)
foreach (var series in Chart1.Series) { series.Enabled = false; }
your first foreach loop is pretty clear and it works, so why need to replace it with LINQ, it wont be more clear or faster.
Otherwise I dont think it will be simpler than what you say above
Chart1.Series.ToList().ForEach( series => series.Enabled = false);
As to the specific reason your Select didn't work, bear in mind that Select returns an IEnumerable<T>, which is an object that knows how to enumerate. In other words, you have a reference to something that can enumerate, but that hasn't yet been enumerated. So if you take this:
var series = Chart1.Series.Select( s => s.Enabled = false);
foreach (var s in series) {}
You'll get the effect you intended, because now you're enumerating the Series and therefore calling the Select delegate. This is rather non-intuitive, of course, which is why Select is typically not used in this way (otherwise every time you enumerate, you'll have side effects and enumerating more than once would apply the side effects again.)

Categories