LINQ - is SkipWhile broken?

LINQ - is SkipWhile broken? - c#

I'm a bit surprised to find the results of the following code, where I simply want to remove all 3s from a sequence of ints:
var sequence = new [] { 1, 1, 2, 3 };
var result = sequence.SkipWhile(i => i == 3); // Oh noes! Returns { 1, 1, 2, 3 }
Why isn't 3 skipped?
My next thought was, OK, the Except operator will do the trick:
var sequence = new [] { 1, 1, 2, 3 };
var result = sequence.Except(i => i == 3); // Oh noes! Returns { 1, 2 }
In summary,
Except removes the 3, but also
removes non-distinct elements. Grr.
SkipWhile doesn't skip the last
element, even if it matches the
condition. Grr.
Can someone explain why SkipWhile doesn't skip the last element? And can anyone suggest what LINQ operator I can use to remove the '3' from the sequence above?

It's not broken. SkipWhile will only skip items in the beginning of the IEnumerable<T>. Once that condition isn't met it will happily take the rest of the elements. Other elements that later match it down the road won't be skipped.
int[] sequence = { 3, 3, 1, 1, 2, 3 };
var result = sequence.SkipWhile(i => i == 3);
// Result: 1, 1, 2, 3

var result = sequence.Where(i => i != 3);

The SkipWhile and TakeWhile operators skip or return elements from a sequence while a predicate function passes (returns True). The ﬁrst element that doesn’t pass the predicate function ends the process of evaluation.
//Bypasses elements in a sequence as long as a specified condition is true and returns the remaining elements.

One solution you may find useful is using List "FindAll" function.
List <int> aggregator = new List<int> { 1, 2, 3, 3, 3, 4 };
List<int> result = aggregator.FindAll(b => b != 3);

Ahmad already answered your question, but here's another option:
var result = from i in sequence where i != 3 select i;

Related

LINQ select should break on first match

I want to convert the following structured code into a more readable LINQ call:
foreach (string line in this.HeaderTexts)
{
Match match = dimensionsSearcher.Match(line);
if (match.Success)
{
// Do something
return;
}
}
I came up with the following code:
Match foundMatch = this.HeaderTexts
.Select(text => dimensionsSearcher.Match(text))
.Where(match => match.Success)
.FirstOrDefault();
if (foundMatch != null)
{
// Do something
return;
}
However, from my understanding, this will run the Regex check for each header text, while my first code breaks as soon as it hits for the first time. Is there a way to optimize the LINQ version of that code, of should I rather stick to the structural code?

Let's say you have a list of integers, you need to add 2 to each number, then find the first one that is even.
var input = new[] { 1, 2, 3, 4, 5, 6 };
var firstEvenNumber = input
.Select(x => x + 2)
.Where(x => x % 2 == 0)
.First();
// firstEvenNumber is 4, which is the input "2" plus two
Now, does the Select evaluate x + 2 on every input before First gets ran? Let's find out. We can replace the code in Select with a multi-line lambda to print to the console when it's evaluated.
var input = new[] { 1, 2, 3, 4, 5, 6 };
var firstEvenNumber = input
.Select(x => {
Console.WriteLine($"Processing {x}");
return x + 2;
})
.Where(x => x % 2 == 0)
.First();
Console.WriteLine("First even number is " + firstEvenNumber);
This prints:
Processing 1
Processing 2
First even number is 4
So it looks like Linq only evaluated the minimum number of entries needed to satisfy Where and First.
Where and First doesn't need all the processed records up-front in order to pass to the next step unlike Reverse(), ToList(), OrderBy(), etc.
If you instead stuck a ToList() before First, it would be a different story.
var input = new[] { 1, 2, 3, 4, 5, 6 };
var firstEvenNumber = input
.Select(x => {
Console.WriteLine($"Processing {x}");
return x + 2;
})
.Where(x => x % 2 == 0)
.ToList() // same thing if you put it before Where instead
.First();
Console.WriteLine("First even number is " + firstEvenNumber);
This prints:
Processing 1
Processing 2
Processing 3
Processing 4
Processing 5
Processing 6
First even number is 4

Your LINQ query does what you hope it does. It will only execute the regex until one header matches. So it has the same behavior as your loop. That's ensured with FirstOrDefault (or First). You could rewrite it to:
Match foundMatch = this.HeaderTexts
.Select(text => dimensionsSearcher.Match(text))
.FirstOrDefault(m => m.Success);
// ...
Note that Single and SingleOrDefault ensure that there is at maximum one match(otherwise they throw an InvalidOperationException), so they might need to enumerate all because they have to check if there is a second match.
Read this blog if you want to understand how lazy evaluation(deferred execution) works:
https://codeblog.jonskeet.uk/category/edulinq/

Figuring out GroupBy in this code snippet

I have a piece of code that I am struggling with figuring out. Not sure what's going on. It should return the most occurring numbers within an array (and it does).
It outputs the following => [2, 3].
I have tried to make my questions as readable as possible, sorry for any eye-strain.
I am struggling to understand the following code:
.GroupBy(..., numbersGroup => numbersGroup.Key),
.OrderByDescending(supergroup => supergroup.Key)
.First()
Could someone help explain this code to me?
I will write down comments inside of the code as far as I have understood it.
int[] numbers1 = { 1, 2, 3, 3, 2, 4 };
// First in GroupBy(x => x) I group all numbers within the array (remove all
// duplicates too?), now my array looks like this [1,2,3,4].
int[] result = numbers1.GroupBy(x => x)
// In GroupBy(numbersGroup => numbersGroup.Count()) I collect all the
// different amount of occurrences withing the array, that would be 1 (1, 4)
// and 2 for (2, 3) so my array should look like this now [1, 2].
// Now this is where things get out of hand, what happens at the rest of it? I
// have tried for 4 hours now and can't figure it out. What exactly happens in
// numbersGroup => numbersGroup.Key? .OrderByDescending(supergroup => supergroup.Key)?
.GroupBy(numbersGroup => numbersGroup.Count(), numbersGroup => numbersGroup.Key)
.OrderByDescending(supergroup => supergroup.Key)
.First()
.ToArray();

Code with my comments:
int[] numbers1 = { 1, 2, 3, 3, 2, 4 };
// First in GroupBy(x => x) all numbers are grouped by their values, so now data is IGrouping<int, int> query like this (formatted as a dict for readability in format {key: value}): {1: [1], 2: [2, 2], 3: [3, 3], 4: [4]} - int is key, value is occurrences list.
int[] result = numbers1.GroupBy(x => x)
// again, do GroupBy by elements count in group. You will get something like this: {1: [1, 4], 2: [2, 3]} - elements count is key, value is array of prev keys
.GroupBy(numbersGroup => numbersGroup.Count(), numbersGroup => numbersGroup.Key)
// sort groups by elements count descending: {2: [2, 3], 1: [1, 4]}
.OrderByDescending(supergroup => supergroup.Key)
// select group with max key (2): [2, 3]
.First()
// create array from this group: [2, 3]
.ToArray();

Every once in a while, I'll come across code that has a lot of chaining, like:
int[] noIdeaWhyThisIsAnArray = Something.DosomethingElse().AndThenAnotherThing().Etc();
Whenever I have trouble understanding this, I break it into steps and use the "var" keyword to simplify things:
var step1 = Something.DosomethingElse();
var step2 = step1.AndThenAnotherThing();
var step3 = step2.Etc();
Then add a breakpoint after step3 is assigned, run the debugger/application, and then start checking out the variables in the Locals tab. In your case, the code would look like:
int[] numbers1 = { 1, 2, 3, 3, 2, 4 };
var step1 = numbers1.GroupBy(x => x);
var step2 = step1.GroupBy(numbersGroup => numbersGroup.Count(), numbersGroup => numbersGroup.Key);
var step3 = step2.OrderByDescending(supergroup => supergroup.Key);
var step4 = step3.First();
var step5 = step4.ToArray();
That said, to answer your specific question:
The first GroupBy simply creates groups of each value/number. So all the 1s go into the first group, then all the 2s go into the next group, all the 3s go into the next group, etc... For example, in the screenshot, you can see the second group has two entries in it - both of them containing "2".
So at this point, there are a total of 4 groups (because there are 4 unique values). Two of the groups have 1 value each, and then the other two groups have 2 values each.
The next step then groups them by that count, so you end up with two groups, where the key indicates how many of each item there are. So the first group has a key of 1, and two values - "1" and "4", which means "1" and "4" both showed up once. The second group has a key of 2, and two values - "2" and "3", which means that "2" and "3" both showed up twice.
The third step orders that result in descending order of the key (and remember, the key indicates how many times those values showed up), so the MOST-frequently-occurring number(s) will be the first element in the result, and the LEAST-frequently-occurring number(s) will be the last element in the result.
The fourth step just takes that first result, which again, is the list of MOST-frequently-occurring numbers, in this case "2" and "3".
Finally, the fifth step takes that list and converts it to an array so instead of it being a Linq grouping object, it's just a simple array of those two numbers.

How can I make my procedure for finding the Nth most frequent element in an array more efficient and compact?

Here's an example of a solution I came up with
using System;
using System.Linq;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
int[] arr = new int[] { 1, 2, 2, 3, 3, 3, 4, 4, 4, 4 };
var countlist = arr.Aggregate(new Dictionary<int,int>(), (D,i) => {
D[i] = D.ContainsKey(i) ? (D[i] + 1) : 1;
return D;
})
.AsQueryable()
.OrderByDescending(x => x.Value)
.Select(x => x.Key)
.ToList();
// print the element which appears with the second
// highest frequency in arr
Console.WriteLine(countlist[2]); // should print 3
}
}
At the very least, I would like to figure out how to
Cut down the query clauses by at least one. While I don't see any redundancy, this is the type of LINQ query where I fret about all the overhead of all the intermediate structures created.
Figure out how to not return an entire list at the end. I just want the 2nd element in the enumerated sequence; I shouldn't need to return the entire list for the purpose of getting a single element out of it.

int[] arr = new int[] { 1, 2, 2, 3, 3, 3, 4, 4, 4, 4 };
var lookup = arr.ToLookup(t => t);
var result = lookup.OrderByDescending(t => t.Count());
Console.WriteLine(result.ElementAt(1).Key);

I would do this.
int[] arr = new int[] { 1, 2, 2, 3, 3, 3, 4, 4, 4, 4 };
int rank =2;
var item = arr.GroupBy(x=>x) // Group them
.OrderByDescending(x=>x.Count()) // Sort based on number of occurrences
.Skip(rank-1) // Traverse to the position
.FirstOrDefault(); // Take the element
if(item!= null)
{
Console.WriteLine(item.Key);
// output - 3
}

I started to answer, saw the above answers and thought I'd compare them instead.
Here is the Fiddle here.
I put a stopwatch on each and took the number of ticks for each one. The results were:
Orignal: 50600
Berkser: 15970
Tommy: 3413
Hari: 1601
user3185569: 1571
It appears #user3185569 has a slightly faster algorithm than Hari and is about 30-40 times quicker than the OP's origanal version. Note is #user3185569 answer above it appears his is faster when scaled.
update: The numbers I posted above were run on my pc. Using .net fiddle to execute produces different results:
Orignal: 46842
Berkser: 44620
Tommy: 11922
Hari: 13095
user3185569: 16491
Putting the Berkser algortihm slightly faster. I'm not entirely clear why this is the case, as I'm targeting the same .net version.

I came up with the the following mash of Linq and a dictionary as what you're looking for is essentialy an ordered dictionary
void Run()
{
int[] arr = new int[] { 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4 };
int[] unique = arr.Distinct().ToArray();
Dictionary<int, int> dictionary = unique.ToDictionary(k => k, v => 0);
for(int i = 0; i < arr.Length; i++)
{
if(dictionary.ContainsKey(arr[i]))
{
dictionary[arr[i]]++;
}
}
List<KeyValuePair<int, int>> solution = dictionary.ToList();
solution.Sort((x,y)=>-1* x.Value.CompareTo(y.Value));
System.Console.WriteLine(solution[2].Key);
}

Check times a value is next to the same value in an array

Lets say I have this array on C#:
int myList = {1,4,6,8,3,3,3,3,8,9,0}
I want to know if a value (lets say from 0-9) is next to itself in the list and how many times. In this case, the value 3 is next to itself and it has 4 repetitions. If I have a list {0,1,2,3,4,5,5,6,7} the value 5 is next to itself and has 2 repetitions.
Repetitions have a limit of 5. No value can be repeated more than 5 times. The far I went is making if statements, but I know there's a better way of doing it.

The standard of question is not that good but writing the answer
int lastValue = myList[0];
int times = 0;
foreach (int value in myList) {
if (lastValue == value) {
times++;
}
else if (times <= 1) {
lastValue = value;
times = 1;
}
else
break;
}
You only have to iterate on your list and keep a counter that will count only the consecutive duplicate integer.

If you want a neater solution, you might look at using an open source library called morelinq (by Jon Skeet and few others) on nuget. It has useful extension methods for LINQ.
One of them is called GroupAdjacent, which is applicable to your problem.
var testList = new[] { 1, 4, 6, 8, 3, 3, 3, 3, 8, 9, 0 };
var groups = testList.GroupAdjacent(t => t);
var groupsWithMoreThanOneMember = groups.Where(g => g.Count() > 1);

why does Enumerable.Except returns DISTINCT items?

Having just spent over an hour debugging a bug in our code which in the end turned out to be something about the Enumerable.Except method which we didn't know about:
var ilist = new[] { 1, 1, 1, 1 };
var ilist2 = Enumerable.Empty<int>();
ilist.Except(ilist2); // returns { 1 } as opposed to { 1, 1, 1, 1 }
or more generally:
var ilist3 = new[] { 1 };
var ilist4 = new[] { 1, 1, 2, 2, 3 };
ilist4.Except(ilist3); // returns { 2, 3 } as opposed to { 2, 2, 3 }
Looking at the MSDN page:
This method returns those elements in
first that do not appear in second. It
does not also return those elements in
second that do not appear in first.
I get it that in cases like this:
var ilist = new[] { 1, 1, 1, 1 };
var ilist2 = new[] { 1 };
ilist.Except(ilist2); // returns an empty array
you get the empty array because every element in the first array 'appears' in the second and therefore should be removed.
But why do we only get distinct instances of all other items that do not appear in the second array? What's the rationale behind this behaviour?

I certainly cannot say for sure why they decided to do it that way. However, I'll give it a shot.
MSDN describes Except as this:
Produces the set difference of two
sequences by using the default
equality comparer to compare values.
A Set is described as this:
A set is a collection of distinct
objects, considered as an object in
its own right

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

LINQ - is SkipWhile broken? - c#

var result = sequence.Where(i => i != 3);

One solution you may find useful is using List "FindAll" function. List <int> aggregator = new List<int> { 1, 2, 3, 3, 3, 4 }; List<int> result = aggregator.FindAll(b => b != 3);

Ahmad already answered your question, but here's another option: var result = from i in sequence where i != 3 select i;

Related

LINQ select should break on first match

Figuring out GroupBy in this code snippet

How can I make my procedure for finding the Nth most frequent element in an array more efficient and compact?

Check times a value is next to the same value in an array

why does Enumerable.Except returns DISTINCT items?

Categories

Resources