Reorder by original index when using Linq - c#

I've look around for a solution this problem and although I've found similar I couldnt find an answer to this specific problem. I've generalised the problem but it goes something as follows:
I have the following int[]
[423]
[234]
[5]
[79]
[211]
[1001]
I would like to use linq to only select the entries that are less than 200 or greater than 300 and then order by the original array index so that the final array is guranteed to be:
[423]
[5]
[79]
[1001]

LINQ to object preserves the order in selection, so a simple where clause would do the job.
Order Preservation in PLINQ
In PLINQ, the goal is to maximize performance while maintaining correctness. A query should run as fast as possible but still produce the correct results. In some cases, correctness requires the order of the source sequence to be preserved; however, ordering can be computationally expensive. Therefore, by default, PLINQ does not preserve the order of the source sequence. In this regard, PLINQ resembles LINQ to SQL, but is unlike LINQ to Objects, which does preserve ordering.
But if you want, you can select the index with the value and later use OrderBy on index
int[] array = new []
{
423,234,5,79,211,1001
};
var sortedArray = array.Select((r, i) => new { value = r, index = i })
.Where(t => t.value < 200 || t.value > 300)
.OrderBy(o => o.index)
.Select(s => s.value).ToArray();

When you filtering objects with Enumerable.Where original order is preserved. MSDN:
LINQ to Objects, does preserve ordering
Few more words. You can think of Where as simple filtering elements in foreach loop, which returns items one by one, in exact same order they are come into loop. Like this:
public IEnumerable<T> Where(this IEnumerable<T> sequence, Func<T,bool> predicate)
{
foreach(var item in sequence)
if (predicate(item))
yield return item;
}
Read more on Jon blog.

No need to do any sorting, order will be upheld.
var someInts = new int[] { 423, 234, 5, 79, 211, 1001 };
var filteredInts = someInts.Where(i => i < 200 || i > 300);
// filteredInts = [423, 5, 79, 1001]

Related

Implicit Index or Generating Index in LINQ

I recently encounter a couple of cases where it makes me wonder if there is any way to get internal index, or if not, to generate index efficiently when using LINQ.
Consider the following case:
List<int> list = new List() { 7, 4, 5, 1, 7, 8 };
In C#, if we are to return the indexes of "7" on the above array using LINQ, we cannot simply using IndexOf/LastIndexOf - for it will only return the first/last result.
This will not work:
var result = list.Where(x => x == 7).Select(y => list.IndexOf(y));
We have several workarounds for doing it, however.
For instance, one way of doing it could be this (which I typically do):
int index = 0;
var result = from x in list
let indexNow = index++
where x == 7
select indexNow;
It will introduce additional parameter indexNow by using let.
And also another way:
var result = Enumerable.Range(0, list.Count).Where(x => list.ElementAt(x) == 7);
It will generate another set of items by using Enumerable.Range and pick the result from it.
Now, I am wondering if there is any alternative, simpler way of doing it, such as:
if there is any (built-in) way to get the internal index of the IEnumerable without declaring something with let or
to generate the index for the IEnumerable using something other than Enumerable.Range (perhaps something like new? Which I am not too familiar how to do it), or
anything else which could shorten the code but still getting the indexes of the IEnumerable.
From IEnumerable.Select with index, How to get index using LINQ?, and so on: IEnumerable<T>.Select() has an overload that provides an index.
Using this, you can:
Project into an anonymous type (or a Tuple<int, T> if you want to expose it to other methods, anonymous types are local) that contains the index and the value.
Filter those results for the value you're looking for.
Project only the index.
So the code will look like this:
var result = list.Select((v, i) => new { Index = i, Value = v })
.Where(i => i.Value == 7)
.Select(i => i.Index);
Now result will contain an enumerable containing all indexes.
Note that this will only work for source collection types that guarantee insertion order and provide a numeric indexer, such as List<T>. When you use this for a Dictionary<TKey, TValue> or HashSet<T>, the resulting indexes will not be usable to index into the dictionary.

LINQ to find array indexes of a value

Assuming I have the following string array:
string[] str = new string[] {"max", "min", "avg", "max", "avg", "min"}
Is it possbile to use LINQ to get a list of indexes that match one string?
As an example, I would like to search for the string "avg" and get a list containing
2, 4
meaning that "avg" can be found at str[2] and str[4].
.Select has a seldom-used overload that produces an index. You can use it like this:
str.Select((s, i) => new {i, s})
.Where(t => t.s == "avg")
.Select(t => t.i)
.ToList()
The result will be a list containing 2 and 4.
Documentation here
You can do it like this:
str.Select((v,i) => new {Index = i, Value = v}) // Pair up values and indexes
.Where(p => p.Value == "avg") // Do the filtering
.Select(p => p.Index); // Keep the index and drop the value
The key step is using the overload of Select that supplies the current index to your functor.
You can use the overload of Enumerable.Select that passes the index and then use Enumerable.Where on an anonymous type:
List<int> result = str.Select((s, index) => new { s, index })
.Where(x => x.s== "avg")
.Select(x => x.index)
.ToList();
If you just want to find the first/last index, you have also the builtin methods List.IndexOf and List.LastIndexOf:
int firstIndex = str.IndexOf("avg");
int lastIndex = str.LastIndexOf("avg");
(or you can use this overload that take a start index to specify the start position)
First off, your code doesn't actually iterate over the list twice, it only iterates it once.
That said, your Select is really just getting a sequence of all of the indexes; that is more easily done with Enumerable.Range:
var result = Enumerable.Range(0, str.Count)
.Where(i => str[i] == "avg")
.ToList();
Understanding why the list isn't actually iterated twice will take some getting used to. I'll try to give a basic explanation.
You should think of most of the LINQ methods, such as Select and Where as a pipeline. Each method does some tiny bit of work. In the case of Select you give it a method, and it essentially says, "Whenever someone asks me for my next item I'll first ask my input sequence for an item, then use the method I have to convert it into something else, and then give that item to whoever is using me." Where, more or less, is saying, "whenever someone asks me for an item I'll ask my input sequence for an item, if the function say it's good I'll pass it on, if not I'll keep asking for items until I get one that passes."
So when you chain them what happens is ToList asks for the first item, it goes to Where to as it for it's first item, Where goes to Select and asks it for it's first item, Select goes to the list to ask it for its first item. The list then provides it's first item. Select then transforms that item into what it needs to spit out (in this case, just the int 0) and gives it to Where. Where takes that item and runs it's function which determine's that it's true and so spits out 0 to ToList, which adds it to the list. That whole thing then happens 9 more times. This means that Select will end up asking for each item from the list exactly once, and it will feed each of its results directly to Where, which will feed the results that "pass the test" directly to ToList, which stores them in a list. All of the LINQ methods are carefully designed to only ever iterate the source sequence once (when they are iterated once).
Note that, while this seems complicated at first to you, it's actually pretty easy for the computer to do all of this. It's not actually as performance intensive as it may seem at first.
While you could use a combination of Select and Where, this is likely a good candidate for making your own function:
public static IEnumerable<int> Indexes<T>(IEnumerable<T> source, T itemToFind)
{
if (source == null)
throw new ArgumentNullException("source");
int i = 0;
foreach (T item in source)
{
if (object.Equals(itemToFind, item))
{
yield return i;
}
i++;
}
}
You need a combined select and where operator, comparing to accepted answer this will be cheaper, since won't require intermediate objects:
public static IEnumerable<TResult> SelectWhere<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, bool> filter, Func<TSource, int, TResult> selector)
{
int index = -1;
foreach (var s in source)
{
checked{ ++index; }
if (filter(s))
yield return selector(s, index);
}
}

remove all element except from given index number

I have following list and how can I remove with linq all elements from given index number:
List<string> a = new List<string>();
a.Add("number1");
a.Add("number2");
a.Add("number3");
How can I remove all element just except element which is index number =2 using linq.
LINQ isn't about removing things - it's about querying.
You can call RemoveRange to remove a range of items from a list though. So:
a.RemoveRange(0, 2);
will leave just "number3".
Or you could create a new list as per dasblinkenlight's answer. If you can tell us more about what you're trying to achieve and why you think LINQ is the solution, we may be able to help you more.
EDIT: Okay, now we have clearer requirements, you can use LINQ:
var newList = a.Where((value, index) => index != 2)
.ToList();
Assume you have list of indices you want to keep, you can use Where with index to filter:
var indexList = new[] {2};
var result = a.Where((s, index) => indexList.Contains(index));
An equivalent operation to "remove everything but X" is "keep X". The simplest way to do it is constructing a new list with a single element at index 2, like this:
a = new List<string>{a[2]};
Although #dasblinkenlight's answer is the better option, here is the linq (or at least one iteration)
a.Where((item,index) => b1 == 2);
or to return a single string objects rather than an IEnumberable
a.Where((a1,b1) => b1 == 2).First();

How to the get an object based on 2 criteria without doing extensive searching?

It's hard to summarize this as a 1 line question. I have a class like this:
class Item
{
int Count {get; set;}
double Value {get; set;}
}
and a List<Item> that contains an arbitary number of Item values.
How can I get the Item with the lowest Count and the highest Value?
Performance is not important but elegance is, as I don't want to have huge nested loops to do this operation unless there is not elegant way, i.e. Linq, etc.
EDIT:
Here is a sample list that could have these Items:
{Count, Value}
{2, 10}, {6, 60}, {5, 21}, {4, 65}, {2, 35}, {4, 18}, {3, 55}, {7, 99}, {2, 25}
So here the value I want is {2, 35} because it has the lowest Count of all items, and for the items with the same Count values, it has the highest Value.
Okay, now we've got a bit of clarity:
as long as it has the lowest Count, the highest Value in that range is OK
It's easy...
var best = list.OrderBy(x => x.Count)
.ThenByDescending(x => x.Value)
.FirstOrDefault();
That will give null if the list is empty, but otherwise the item with the lowest count, and the one with the highest value if there are multiple with that count.
This is less efficient than it might be, as we could really do it in a single pass by creating a comparer to compare any two values - but this will certainly be the approach which gets the right element in the least code.
Hopefully it works now, sadly not so elegant like Jon's
var res = (from i in items
orderby i.Count ascending, i.Value descending
select i).FirstOrDefault();
With morelinq (which has MinBy and MaxBy operators), you can accomplish this easily in linear time (unfortunately needs an extra pass):
IEnumerable<item> items = ...
var minCount = items.Min(item => item.Count);
var minCountItemWithMaxValue = items.Where(item => item.Count == minCount)
.MaxBy(item => item.Value);
With an appropriate AllMinBy extension that returned all the minimum items in a sequence (missing in morelinq sadly), you could make this even more efficient:
var minCountItemWithMaxValue = items.AllMinBy(item => item.Count)
.MaxBy(item => item.Value);
EDIT:
Here's an (ugly) way to do it in O(1) space and O(n) time in a single pass with standard LINQ:
var minCountItemWithMaxValue = items.Aggregate(
(bestSoFar, next) =>
(next.Count < bestSoFar.Count) ? next :
(next.Count > bestSoFar.Count) ? bestSoFar :
(next.Value > bestSoFar.Value) ? next : bestSoFar);

Lambda expression to find difference

With the following data
string[] data = { "a", "a", "b" };
I'd very much like to find duplicates and get this result:
a
I tried the following code
var a = data.Distinct().ToList();
var b = a.Except(a).ToList();
obviously this didn't work, I can see what is happening above but I'm not sure how to fix it.
When runtime is no problem, you could use
var duplicates = data.Where(s => data.Count(t => t == s) > 1).Distinct().ToList();
Good old O(n^n) =)
Edit: Now for a better solution. =)
If you define a new extension method like
static class Extensions
{
public static IEnumerable<T> Duplicates<T>(this IEnumerable<T> input)
{
HashSet<T> hash = new HashSet<T>();
foreach (T item in input)
{
if (!hash.Contains(item))
{
hash.Add(item);
}
else
{
yield return item;
}
}
}
}
you can use
var duplicates = data.Duplicates().Distinct().ToArray();
Use the group by stuff, the performance of these methods are reasonably good. Only concern is big memory overhead if you are working with large data sets.
from g in (from x in data group x by x)
where g.Count() > 1
select g.Key;
--OR if you prefer extension methods
data.GroupBy(x => x)
.Where(x => x.Count() > 1)
.Select(x => x.Key)
Where Count() == 1 that's your distinct items and where Count() > 1 that's one or more duplicate items.
Since LINQ is kind of lazy, if you don't want to reevaluate your computation you can do this:
var g = (from x in data group x by x).ToList(); // grouping result
// duplicates
from x in g
where x.Count() > 1
select x.Key;
// distinct
from x in g
where x.Count() == 1
select x.Key;
When creating the grouping a set of sets will be created. Assuming that it's a set with O(1) insertion the running time of the group by approach is O(n). The incurred cost for each operation is somewhat high, but it should equate to near linear performance.
Sort the data, iterate through it and remember the last item. When the current item is the same as the last, its a duplicate. This can be easily implemented either iteratively or using a lambda expression in O(n*log(n)) time.

Categories