why does Enumerable.Except returns DISTINCT items?

why does Enumerable.Except returns DISTINCT items? - c#

Having just spent over an hour debugging a bug in our code which in the end turned out to be something about the Enumerable.Except method which we didn't know about:
var ilist = new[] { 1, 1, 1, 1 };
var ilist2 = Enumerable.Empty<int>();
ilist.Except(ilist2); // returns { 1 } as opposed to { 1, 1, 1, 1 }
or more generally:
var ilist3 = new[] { 1 };
var ilist4 = new[] { 1, 1, 2, 2, 3 };
ilist4.Except(ilist3); // returns { 2, 3 } as opposed to { 2, 2, 3 }
Looking at the MSDN page:
This method returns those elements in
first that do not appear in second. It
does not also return those elements in
second that do not appear in first.
I get it that in cases like this:
var ilist = new[] { 1, 1, 1, 1 };
var ilist2 = new[] { 1 };
ilist.Except(ilist2); // returns an empty array
you get the empty array because every element in the first array 'appears' in the second and therefore should be removed.
But why do we only get distinct instances of all other items that do not appear in the second array? What's the rationale behind this behaviour?

I certainly cannot say for sure why they decided to do it that way. However, I'll give it a shot.
MSDN describes Except as this:
Produces the set difference of two
sequences by using the default
equality comparer to compare values.
A Set is described as this:
A set is a collection of distinct
objects, considered as an object in
its own right

Related

How the first array entry differs from the second

I can not understand the difference between the declaration with array initialization in the first case and the second
int[] array = new int[3] { 1, 2, 3 };
int[] secondArray = { 1, 2, 3 };
They seem to do the same thing, maybe they work differently?

The is no difference in the result between the two show lines shown:
int[] array = new int[3] { 1, 2, 3 };
int[] secondArray = { 1, 2, 3 };
However, there are practical differences between new int[n] {...} syntax and {...}:
Implicit type is not available for the alternative array initialiser:
var a1 = new int[3] { 1, 2, 3 }; // OK
var a2 = { 1, 2, 3 }; // Error: Cannot initialize an implicitly-typed variable with an array initializer
// BTW. You can omit the size
var a3 = new int[] { 1, 2, 3 }; // OK
With the alternative syntax you cannot specify the size, it's always inferred.
var a1 = new int[100]; // Array with 100 elements (all 0)
int[] a2 = { }; // Array with no elements

There is no difference in the compiled code between the two lines.

The second one is just a shortcut. Both statements have the same result. The shorter variant just wasn't available in early versions of C#.

The first one uses 3 as a array size explictly, the 2nd one size is inferred.
This might be work if you dont want to initialize the values.

There is no difference between this two array initialization syntaxes in terms how they will be translated by the compiler into IL (you can play with it at sharplab.io) and it is the same as the following one:
int[] thirdArray = new int[] { 1, 2, 3 };
The only difference comes when you are using those with already declared variable, i.e. you can use 1st and 3rd to assign new value to existing array variable but not the second one:
int[] arr;
arr = new int[3] { 1, 2, 3 }; // works
// arr = { 1, 2, 3 }; // won't compile
arr = new int[] { 1, 2, 3 }; // works

After shuffling an array to the point where it matches the original, why does equating them return false?

Consider the following method of shuffling, given an array of objects a
Take the first element from a and place it into b. Consider the index of this element inside b to be x.
Place the second element from a and place it in front of b[x], so that it is now in position b[x-1]
Place the third element from a and place it behind b[x], so that it is now in position b[x+1]
Place the fourth element from a and place it in front of b[x - 1], so that it is now in position b[x-2]
Place the firth element from a and place it behind b[x+1] so that it is now in position b[x+2]
Repeat this process until b has all of the elements from a in it in this new shuffled order.
I wrote some code which does this, shown below. It will continuously shuffle the array in the above process until the shuffled array matches the original array, and then return the number of shuffles.
public class BadShuffler
{
public BadShuffler(object[] _arrayToShuffle)
{
originalArray = _arrayToShuffle;
Arrays = new List<object[]>
{
originalArray
};
}
private object[] originalArray;
private int count;
public List<object[]> Arrays { get; set; }
public int Shuffle(object[] array = null)
{
if (array == null)
array = originalArray;
count++;
object[] newArray = new object[array.Length];
bool insertAtEnd = false;
int midpoint = newArray.Length / 2;
newArray[midpoint] = array[0];
int newArrayInteger = 1;
int originalArrayInteger = 1;
while (newArray.Any(x => x == null))
{
if (insertAtEnd)
{
newArray[midpoint + newArrayInteger] = array[originalArrayInteger];
newArrayInteger++;
}
else
{
newArray[midpoint - newArrayInteger] = array[originalArrayInteger];
}
originalArrayInteger++;
insertAtEnd = !insertAtEnd;
}
Arrays.Add(newArray);
return (newArray.All(x => x == originalArray[Array.IndexOf(newArray, x)])) ? count : Shuffle(newArray);
}
}
While not being the prettiest thing in the world, it does the job. Example shown below:
Shuffled 6 times.
1, 2, 3, 4, 5, 6
6, 4, 2, 1, 3, 5
5, 1, 4, 6, 2, 3
3, 6, 1, 5, 4, 2
2, 5, 6, 3, 1, 4
4, 3, 5, 2, 6, 1
1, 2, 3, 4, 5, 6
However, if I give it an array of [1, 2, 3, 3, 4, 5, 6] it ends up throwing a StackOverflowException. When debugging, however, I have found that it does actually get to a point where the new shuffled array matches the original array, as shown below.
This then goes on to call Shuffle(newArray) again, even though all values in the array match each other.
What is causing this? Why does the Linq query newArray.All(x => x == originalArray[Array.IndexOf(newArray, x)]) return false?
Here is a DotNetFiddle link, which includes the code I used to print out the result(s)

You are comparing objects. objects are compared using referential equality with ==, not value equality. Your example uses numbers, but those numbers are boxed to an object implicitly due to the way your code is laid out.
To avoid this, you should use the .Equals() function (when comparing Objects).
newArray.All(x => x.Equals(originalArray[Array.IndexOf(newArray, x)]))
You should also use generics in your class instead of littering object[] everywhere to ensure type safety - unless one of your aims with this shuffler is to allow the shuffler to shuffle arrays of mixed types (which seems doubtful since it would be hard to extract any useful information out of that).
Note that this behaviour is exhibited whenever you are comparing reference types; one way to only allow value types to be passed to your structure (i.e, only primitive values that can be compared by value equality rather than referential equality) is to use the struct generic constraint. As an example:
class BadShuffler<T> where T : struct
{
public bool Shuffle(T[] array)
{
...
return newArray.All(x => {
var other = originalArray[Array.IndexOf(originalArray, x)];
return x == other;
});
}
}
This would work as you expect.
SequenceEqual as mentioned in the comments is also a good idea, as your .All() call will say that [1, 2, 3] is equal to [1, 2, 3, 4], but [1, 2, 3, 4] will not be equal to [1, 2, 3] - both of these scenarios are incorrect and more importantly not commutative[1], which equality operations should be.
Just make sure you implement your own EqualityComparer if you go beyond using object[].
That said, I think you want to use a combination of both approaches and use SequenceEqual with my approach, unless you need to shuffle objects (I.e, a Deck of Cards) rather than numbers?
As a side note, I would generally recommend returning a new, shuffled T[] rather than modifying the original one in-place.
[1]: Commutative means that an operation done one way can be done in reverse and you get the same result. Addition, for example, is commutative: you can sum 1, 2 and 3 together in any order but the outcome will always be 6.

Omitting c# new from jagged array initialization

From: http://msdn.microsoft.com/en-us/library/2s05feca.aspx
Notice that you cannot omit the new operator from the elements initialization because there is no default initialization for the elements:
int[][] jaggedArray3 =
{
new int[] {1,3,5,7,9},
new int[] {0,2,4,6},
new int[] {11,22}
};
What does it mean?
Why is it ok to omit new in:
int[] arrSimp = { 1, 2, 3 };
int[,] arrMult = { { 1, 1 }, { 2, 2 }, { 3, 3 } };
but not possible in:
int[][,] arrJagg = {new int[,] { { 1, 1} }, new int[,] { { 2, 2 } }, new int[,] { { 3, 3 } } };

First off, what a coincidence, an aspect of your question is the subject of my blog today:
http://ericlippert.com/2013/01/24/five-dollar-words-for-programmers-elision/
You've discovered a small "wart" in the way C# classifies expressions. As it turns out, the array initializer syntax {1, 2, 3} is not an expression. Rather, it is a syntactic unit that can only be used as part of another expression:
new[] { 1, 2, 3 }
new int[] { 1, 2, 3 }
new int[3] { 1, 2, 3 }
new int[,] { { 1, 2, 3 } }
... and so on
or as part of a collection initializer:
new List<int> { 1, 2, 3 }
or in a variable declaration:
int[] x = { 1, 2, 3 };
It is not legal to use the array initializer syntax in any other context in which an expression is expected. For example:
int[] x;
x = { 1, 2, 3 };
is not legal.
It's just an odd corner case of the C# language. There's no deeper meaning to the inconsistency you've discovered.

In essence the answer is "because they (meaning the language designers) choose not to.To quote from Eric Lippert:
The same reason why every unimplemented feature is not implemented:
features are unimplemented by default. In order to become implemented
a feature must be (1) thought of, (2) designed, (3) specified, (4)
implemented, (5) tested, (6) documented and (7) shipped.
More technically there is a good reason to it and that's the definition of jagged arrays compared to 1-dimension and multi-dimension arrays.
A one or more dimension arrays can be expressed in plain English as a X dimension array of T where a jagged array has to be expressed as an Array of arrays of T. In the second case, there is a loose coupling between the inner array and the outer arary. That is, you can assign a new array to a position within the outer array whereas a x dimension array is fixed.
Now that we know that Jagged arrays are very different from multi-dimensional arrays in their implementation, we can also assume why there is a different level of integrated support for the 2. It's certainly not impossible to add support, just a question of demand and time.
(as a teaser, why only add support for jagged arrays? how about your own custom types?)

Retrieving values from multidimensional arrays

I have an array something like this:
int[,] multiDimensionalArray2 = { { 1, 2 }, { 4, 5 } };
if I want to retrieve 1 and 2 and feed them into this:
int a;
int b;
How do I do it?
Is it something like this:
multiDimensionalArray2[0,0]
What if I wanted to put more numbers in the same form e.g. { { 2, 1 }, { 4, 1 } };, in the same form as above. Would it be something like this:
int[,] multiDimensionalArray2 = { { 1, 2 }, { 4, 5 } },{ { 2, 1 }, { 4, 1 } };
To retrieve the second set would I do this, multiDimensionalArray2[1,1]

You´re close, to retrieve the first numbers try this
var a = multiDimensionalArray2[0, 0]; // x == 1
var b = multiDimensionalArray2[0, 1]; // x == 2
Did you give it a try and it didn´t work? You´ll notice that SO users will encourage you to try-and-error first. Come back when you hit a wall :)

Double dimention arrays are stored like this (picture it in your mind) as per your example
Row0 Row1
1 2
4 5
2 1
4 1
Now follow what Dominik suggested.

If you have a multi dimensional array like the one you gave,
int[,] multiDimensionalArray2 = { { 1, 2 }, { 4, 5 }, { 2, 1 }, { 4, 1 } };
We want to get the index of the number 5. To do this we need two indexes, [a,b]
Index a is the index of the "group of numbers" that you want to get.
First look at which group it is in. The first group (index 0) contains 1 and 2, the second group (index 1) contains 4 and 5.
Therefore a = 1.
Index b is the index of the "position within the group" that you want to get.
In the group {4, 5}, the number 5 is the first item (index 0). Therfore b = 0.
This means that the number 5 can be found at multiDimensionalArray2[1,0]

Chao, I search documents in MSDN site for you, you should give it a try if you want.
Here is how I see on MSDN
List< > >
Dictionary<>
I guess it works for your problem too, it can dynamically grow longer or shorter. they say using generic classes might be better, I am not sure about whether it ought to be faster or not though.

LINQ - is SkipWhile broken?

I'm a bit surprised to find the results of the following code, where I simply want to remove all 3s from a sequence of ints:
var sequence = new [] { 1, 1, 2, 3 };
var result = sequence.SkipWhile(i => i == 3); // Oh noes! Returns { 1, 1, 2, 3 }
Why isn't 3 skipped?
My next thought was, OK, the Except operator will do the trick:
var sequence = new [] { 1, 1, 2, 3 };
var result = sequence.Except(i => i == 3); // Oh noes! Returns { 1, 2 }
In summary,
Except removes the 3, but also
removes non-distinct elements. Grr.
SkipWhile doesn't skip the last
element, even if it matches the
condition. Grr.
Can someone explain why SkipWhile doesn't skip the last element? And can anyone suggest what LINQ operator I can use to remove the '3' from the sequence above?

It's not broken. SkipWhile will only skip items in the beginning of the IEnumerable<T>. Once that condition isn't met it will happily take the rest of the elements. Other elements that later match it down the road won't be skipped.
int[] sequence = { 3, 3, 1, 1, 2, 3 };
var result = sequence.SkipWhile(i => i == 3);
// Result: 1, 1, 2, 3

var result = sequence.Where(i => i != 3);

The SkipWhile and TakeWhile operators skip or return elements from a sequence while a predicate function passes (returns True). The ﬁrst element that doesn’t pass the predicate function ends the process of evaluation.
//Bypasses elements in a sequence as long as a specified condition is true and returns the remaining elements.

One solution you may find useful is using List "FindAll" function.
List <int> aggregator = new List<int> { 1, 2, 3, 3, 3, 4 };
List<int> result = aggregator.FindAll(b => b != 3);

Ahmad already answered your question, but here's another option:
var result = from i in sequence where i != 3 select i;

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

why does Enumerable.Except returns DISTINCT items? - c#

Related

How the first array entry differs from the second

After shuffling an array to the point where it matches the original, why does equating them return false?

Omitting c# new from jagged array initialization

Retrieving values from multidimensional arrays

LINQ - is SkipWhile broken?

Categories

Resources