Get indexes of all matching values from list using Linq - c#

Hey Linq experts out there,
I just asked a very similar question and know the solution is probably SUPER easy, but still find myself not being able to wrap my head around how to do this fairly simple task in the most efficient manner using linq.
My basic scenario is that I have a list of values, for example, say:
Lst1:
a
a
b
b
c
b
a
c
a
And I want to create a new list that will hold all the indexes from Lst1 where, say, the value = "a".
So, in this example, we would have:
LstIndexes:
0
1
6
8
Now, I know I can do this with Loops (which I would rather avoid in favor of Linq) and I even figured out how to do this with Linq in the following way:
LstIndexes= Lst1.Select(Function(item As String, index As Integer) index) _
.Where(Function(index As Integer) Lst1(index) = "a").ToList
My challenge with this is that it iterates over the list twice and is therefore inefficient.
How can I get my result in the most efficient way using Linq?
Thanks!!!!

First off, your code doesn't actually iterate over the list twice, it only iterates it once.
That said, your Select is really just getting a sequence of all of the indexes; that is more easily done with Enumerable.Range:
var result = Enumerable.Range(0, lst1.Count)
.Where(i => lst1[i] == "a")
.ToList();
Understanding why the list isn't actually iterated twice will take some getting used to. I'll try to give a basic explanation.
You should think of most of the LINQ methods, such as Select and Where as a pipeline. Each method does some tiny bit of work. In the case of Select you give it a method, and it essentially says, "Whenever someone asks me for my next item I'll first ask my input sequence for an item, then use the method I have to convert it into something else, and then give that item to whoever is using me." Where, more or less, is saying, "whenever someone asks me for an item I'll ask my input sequence for an item, if the function say it's good I'll pass it on, if not I'll keep asking for items until I get one that passes."
So when you chain them what happens is ToList asks for the first item, it goes to Where to as it for it's first item, Where goes to Select and asks it for it's first item, Select goes to the list to ask it for its first item. The list then provides it's first item. Select then transforms that item into what it needs to spit out (in this case, just the int 0) and gives it to Where. Where takes that item and runs it's function which determine's that it's true and so spits out 0 to ToList, which adds it to the list. That whole thing then happens 9 more times. This means that Select will end up asking for each item from the list exactly once, and it will feed each of its results directly to Where, which will feed the results that "pass the test" directly to ToList, which stores them in a list. All of the LINQ methods are carefully designed to only ever iterate the source sequence once (when they are iterated once).
Note that, while this seems complicated at first to you, it's actually pretty easy for the computer to do all of this. It's not actually as performance intensive as it may seem at first.

This works, but arguably not as neat.
var result = list1.Select((x, i) => new {x, i})
.Where(x => x.x == "a")
.Select(x => x.i);

How about this one, it works pretty fine for me.
static void Main(string[] args)
{
List<char> Lst1 = new List<char>();
Lst1.Add('a');
Lst1.Add('a');
Lst1.Add('b');
Lst1.Add('b');
Lst1.Add('c');
Lst1.Add('b');
Lst1.Add('a');
Lst1.Add('c');
Lst1.Add('a');
var result = Lst1.Select((c, i) => new { character = c, index = i })
.Where(list => list.character == 'a')
.ToList();
}

Related

Link to find collection items from second collection very slow

Please let me know if this is already answered somewhere; I can't find it.
In memory, I have a collection of objects in firstList and related objects in secondList. I want to find all items from firstList where the id's match the secondList item's RelatedId. So it's fairly straightforward:
var items = firstList.Where(item => secondList.Any(secondItem => item.Id == secondItem.RelatedId));
But when both first and second lists get even moderately larger, this call is extremely expensive and takes many seconds to complete. Is there a way to break this up, or redesign it into multiple queries to make it more efficient?
The reason this code is so inefficient is that for every element of the first list, it has to iterate over (in the worst case) the entirety of the second list to look for an item with matching id.
A more efficient way to do this using LINQ would be using the Join method as follows:
var items = firstList.Join(secondList, item => item.Id, secondItem => secondItem.RelatedId, (item, _) => item);
If the second collection may contain duplicate IDs, you will additionally have to run Distinct() (possibly with some changes depending on the equality semantics for the members of the first list) on the result to maintain the semantics of your original code.
This code resulted in a roughly 100x speedup for me with a test using two lists of 10000 elements each.
If you're running this operation often and one of the collections does not change, you could consider caching the Ids or RelatedIds in a HashSet instead.

How to Update a list with another list efficiently C#

I have two lists that have object elements, one big list let's call it List1 and another small list List2.
I need to update values in List1 with values in List2 based on a condition that is defined in a function that returns a boolean based on the values in the objects.
I have come up with the following implementation which is really taking a lot of time for larger lists.
function to check whether an item will be updated
private static bool CheckMatch(Item item1, Item item2) {
//do some stuff here and return a boolean
}
query I'm using to update the items
In the snippet below, I need to update List1(larger list) with some values in List2(small list)
foreach(var item1 in List1)
{
var matchingItems = List2.Where(item2 => CheckMatch(item1, item2));
if (matchingItems.Any())
{
item1.IsExclude = matchingItems.First().IsExcluded;
item1.IsInclude = matchingItems.First().IsIncluded;
item1.Category = matchingItems.First().Category;
}
}
I'm hoping I will get a solution that is much better than this. I also need to maintain the position of elements in List1
Here is sample of what I'm doing
Here is sample of what I'm doing
As LP13's answer points out, you're doing a large amount of re-computation by re-executing a query instead of executing it once and caching the result.
But the larger problem here is that if you have n items in List1 and m potential matches in List2, and you are looking for any match, then worst case you will definitely do n * m matches. If n and m are large, their product is rather larger. And since we're looking for any match, the worst case is when there is no match; you'll definitely try all m possibilities.
Is this cost avoidable? Maybe, but only if we know some trick to take advantage of, and you've made the problem so abstract -- we have two lists and a relation, and no information about either the lists or the relation -- that there is no structure that we can take advantage of.
That said: if you happen to know that there is an element in List2 that is likely to match many items in List1 then put that element first. Any, or FirstOrDefault, will stop executing the Where query after getting the first match, so you can turn an O(n * m) problem into an O(n) problem.
Without knowing more about what the relation is, it's hard to say how to improve the performance.
UPDATE: A commenter points out that we can do better if we know that the relation is an equivalence relation. Is it an equivalence relation? That is, suppose we have your method that checks two items. Are we guaranteed the following?
The relation is reflexive: CheckMatch(a, a) is always true.
The relation is symmetric: CheckMatch(a, b) is always the same as CheckMatch(b, a)
The relation is transitive: if CheckMatch(a, b) is true and CheckMatch(b, c) is true then CheckMatch(a, c) is always true
If we have those three conditions then you can do considerably better. Such a relation partitions elements into equivalence classes. What you do is associate each item in List1 and List2 with a canonical value. That canonical value is the same for every member of the equivalence class. From that dictionary you can then do fast lookups and solve your problem quickly.
But if your relation is not an equivalence relation, this does not work.
Can you try this? When you do only .Where it produces IEnumerable and then you are doing First() and Any() on IEnumerable
foreach(var item1 in List1)
{
var matchingItem = List2.Where(item2 => CheckMatch(item1, item2)).FirstOrDefault();
if (matchingItem != null)
{
item1.IsExclude = matchingItem.IsExcluded;
item1.IsInclude = matchingItem.IsIncluded;
item1.Category = matchingItem.Category;
}
}

Selecting items in an ordered list after a certain entry

I have an ordered list of objects. I can easily find an item in the list by using the following code:
purchaseOrders.FirstOrDefault(x => x.OurRef.Equals(lastPurchaseOrder, StringComparison.OrdinalIgnoreCase))
What I want to do is select all the items in the list that appear after this entry. How best to achieve this? Would it to be to get the index of this item and select a range?
It sounds like you want SkipWhile:
var orders = purchaseOrders.SkipWhile(x => !x.OurRef.Equals(...));
Once the iterator has stopped skipping, it doesn't evaluate the predicate for later entries.
Note that that code will include the entry that doesn't match the predicate, i.e. the one with the given reference. It will basically give you all entries from that order onwards. You can always use .Skip(1) if you want to skip that:
// Skip the exact match
var orders = purchaseOrders.SkipWhile(x => !x.OurRef.Equals(...)).Skip(1);
This will be linear, mind you... if the list is ordered by x.OurRef you could find the index with a binary search and take the range from there onwards... but I wouldn't do that unless you find that the simpler code causes you problems.
Probably you should take a look at LINQ's combination of Reverse and TakeWhile methods, if I understand your question correctly.
It may look like purchaseOrder.Reverse().TakeWhile(x => !x.OurRef.Equals(lastPurchaseOrder, StringComparison.OrdinalIgnoreCase)).
Sorry if code is unformatted, I'm from mobile web right now.
May be you want something like this:
int itemIndex = list.IndexOf(list.FirstOrDefault(x => x.OurRef.Equals(lastPurchaseOrder, StringComparison.OrdinalIgnoreCase));
var newList = list.Where((f, i) => i >= itemIndex);

Removing from a collection while in a foreach with linq

From what I understand, this seems to not be a safe practice...
I have a foreach loop on my list object that I am stepping through. Inside that foreach loop I am looking up records by an Id. Once I have that new list of records returned by that Id I do some parsing and add them to a new list.
What I would like to do is not step through the same Id more than once. So my thought process would be to remove it from the original list. However, this causes an error... and I understand why.
My question is... Is there a safe way to go about this? or should I restructure my thought process a bit? I was wondering if anyone had any experience or thoughts on how to solve this issue?
Here is a little pseudocode:
_myList.ForEach(x =>
{
List<MyModel> newMyList = _myList.FindAll(y => y.SomeId == x.SomeId).ToList();
//Here is where I would do some work with newMyList
//Now I am done... time to remove all records with x.SomeId
_myList.RemoveAll(y => y.SomeId == x.SomeId);
});
I know that _myList.RemoveAll(y => y.SomeId == x.SomeId); is wrong, but in theory that would kinda be what I would be looking for.
I have also toyed around with the idea of pushing the used SomeId to an idList and then have it check each time, but that seems cumbersome and was wondering if there was a nicer way to handle what I am looking to do.
Sorry if i didnt explain this that well. If there are any questions, please feel free to comment and I will answer/make edits where needed.
First off, using ForEach in your example isn't a great idea for these reasons.
You're right to think there are performance downsides to iterating through the full list for each remaining SomeId, but even making the list smaller every time would still require another full iteration of that subset (if it even worked).
As was pointed out in the comments, GroupBy on SomeId organizes the elements into groupings for you, and allows you to efficiently step through each subset for a given SomeId, like so:
_myList.GroupBy(x => x.SomeId)
.Select(g => DoSomethingWithGroupedElements(g));
Jon Skeet has an excellent set of articles about how the Linq extensions could be implemented. I highly recommend checking it out for a better understanding of why this would be more efficient.
First of all, a list inside a foreach is immutable, you can't add or delete content, nor rewrite an element. There are a few ways you could handle this situation:
GroupBy
This is the method I would use. You can group your list by the property you want, and iterate through the IGrouping formed this way
var groups = list.GroupBy(x => x.yourProperty);
foreach(var group in groups)
{
//your code
}
Distinct properties list
You could also save properties in another list, and cycle through that list instead of the original one
var propsList = list.Select(x=>x.yourProperty).Distinct();
foreach(var prop in propsList)
{
var tmpList = list.Where(x=>x.yourProperty == prop);
//your code
}
While loop
This will actually do what you originally wanted, but performances may not be optimal
while(list.Any())
{
var prop = list.First().yourProperty;
var tmpList = list.Where(x=>x.yourProperty == prop);
//your code
list.RemoveAll(x=>x.yourProperty == prop);
}

linq: separate orderby and thenby statements

I'm coding through the 101 Linq tutorials from here:
http://code.msdn.microsoft.com/101-LINQ-Samples-3fb9811b
Most of the examples are simple, but this one threw me for a loop:
[Category("Ordering Operators")]
[Description("The first query in this sample uses method syntax to call OrderBy and ThenBy with a custom comparer to " +
"sort first by word length and then by a case-insensitive sort of the words in an array. " +
"The second two queries show another way to perform the same task.")]
public void Linq36()
{
string[] words = { "aPPLE", "AbAcUs", "bRaNcH", "BlUeBeRrY", "ClOvEr", "cHeRry", "b1" };
var sortedWords =
words.OrderBy(a => a.Length)
.ThenBy(a => a, new CaseInsensitiveComparer());
// Another way. TODO is this use of ThenBy correct? It seems to work on this sample array.
var sortedWords2 =
from word in words
orderby word.Length
select word;
var sortedWords3 = sortedWords2.ThenBy(a => a, new CaseInsensitiveComparer());
No matter which combination of words I throw at it the length is always the first ordering criteria ... even though I don't know how the second statement (with no orderby!) knows what the original orderby clause was.
Am I going crazy? Can anyone explain how Linq "remembers" what the original ordering was?
The return type of OrderBy is not IEnumerable<T>. It's IOrderedEnumerable<T>. This is an object that "remembers" all of the orderings it's been given, and as long as you don't call another method that turns the variable back into an IEnumerable it will retain that knowledge.
See Jon Skeets wonderful blog series Eduling in which he re-implements Linq-to-objects for more info. The key entries on OrderBy/ThenBy are:
IOrderedEnumerable
OrderBy, OrderByDescending, ThenBy, ThenByDescending
This is because LINQ is lazy, the first i.e. all the evaluation only happens when you enumerate the sequence.. the expression tree that has been constructed gets executed.
Your question really doesn't make much sense on the surface because you're not considering the nature of the deferred execution. It doesn't "remember" in either case truthfully, it simply isn't executed until it's really needed. If you run over your examples in the debugger you will find that these generate identical (structurally anyway) statements. Consider:
var sortedWords =
words.OrderBy(a => a.Length)
.ThenBy(a => a, new CaseInsensitiveComparer());
You've explicitly told it to OrderBy, ThenBy. Each statement is stacked on until they're all complete, and the finally query is constructed to look like (psuedo):
Select from sorted words, order by length, order by comparer
Then once that is all ready to go it is executed and placed into sortedWords. Now consider:
var sortedWords2 =
from word in words
orderby word.Length // You're telling it to sort here
select word;
// Now you're telling it to ThenBy here
var sortedWords3 = sortedWords2.ThenBy(a => a, new CaseInsensitiveComparer());
And then once those queries are stacked up it will be executed. However, it WON'T be executed until you NEED them. sortedWords3 won't really have any value until you act on it because the need for it is deferred. So in both cases, you're basically saying to the compiler:
Wait until I'm done building my query
Select from source
Order by length
Then by comparer
Ok do your stuff.
Note: To sum up, LINQ doesn't "remember", it simply doesn't execute until you're done giving it instructions to execute. Then it stacks them up into a query and runs them all at once when they're needed.

Categories