How does Enumerable.OrderBy use keySelector - c#

Using the proceeding code, I successfully managed to produce a collection of numbers and shuffle the elements' position in the array:
var randomNumbers = Enumerable.Range(0, 100)
.OrderBy(x => Guid.NewGuid());
Everything functions fine but it's kind of thrown a spanner in my works when trying to understand Enumerable.OrderBy. Take the following code for example:
var pupils = new[]
{
new Person() { Name = "Alex", Age = 17 },
new Person() { Name = "Jack", Age = 21 }
};
var query = pupils.OrderBy(x => x.Age);
It's my understanding that I am passing the property I wish to sort by and I presume that LINQ will use Comparer<T>.Default to determine how to order the collection if no explicit IComparer is specified for the second overload. I really fail to see how any of this reasonable logic can be applied to shuffle the array in such a way. So how does LINQ let me shuffle an array like this?

How does Enumerable.OrderBy use keySelector?
Enumerable.OrderBy<T> lazily returns - keySelector is not called directly. The result is an IOrderedEnumerable<T> that will perform the ordering when enumerated.
When enumerated, keySelector is called once for each element. The order of the keys defines the new order of the elements.
Here's a nifty sample implementation.
So how does LINQ let me shuffle an array like this?
var randomNumbers = Enumerable
.Range(0, 100)
.OrderBy(x => Guid.NewGuid());
Guid.NewGuid is called for each element. The call for the second element may generate a value higher or lower than the call for first element.
randomNumbers is an IOrderedEnumerable<int> that produces a different order each time it is enumerated. KeySelector is called once per element each time randomNumbers is enumerated.

You're pretty close to understanding how such shuffling works.. In your second case
pupils.OrderBy(x => x.Age);
the Comparer<int>.Default is used (the persons are sorted by their Age, simple).
In your first case, Comparer<Guid>.Default is used.
Now how does that work?.
Every time you do Guid.NewGuid() (presumably) a different/original/non duplicated Guid is produced. Now when you do
var randomNumbers = Enumerable.Range(0, 100).OrderBy(x => Guid.NewGuid());
the numbers are sorted on the basis of the generated Guids.
Now what are guids?
They are 128 bit integers represented in hexadecimal form. Since 2^128 is such a large number the chances of generating two Guids are very rare/almost impossible. Since Guids exhibit some sort of randomness, the ordering will be random too.
How are two Guids compared to enforce the ordering?
You can confirm it based on a trivial experiment. Do:
var guids = Enumerable.Range(0, 10).Select((x, i) =>
{
Guid guid = Guid.NewGuid();
return new { Guid = guid, NumberRepresentation = new BigInteger(guid.ToByteArray()), OriginalIndex = i };
}).ToArray();
var guidsOrderedByTheirNumberRepresentation = guids.OrderBy(x => x.NumberRepresentation).ToArray();
var guidsOrderedAsString = guids.OrderBy(x => x.Guid.ToString()).ToArray();
var randomNumbers = Enumerable.Range(0, 10).OrderBy(x => guids[x].Guid).ToArray();
//print randomNumbers.SequenceEqual(guidsOrderedByTheirNumberRepresentation.Select(x => x.OriginalIndex)) => false
//print randomNumbers.SequenceEqual(guidsOrderedAsString.Select(x => x.OriginalIndex)) => true
So Comparer<Guid>.Default is based on the string representation of the guid.
Aside:
You should use Fisher-Yates shuffling for speed. May be
public static IEnumerable<T> Shuffle<T>(this IList<T> lst)
{
Random rnd = new Random();
for (int i = lst.Count - 1; i >= 0; i--)
{
int j = rnd.Next(i + 1);
yield return lst[j];
lst[j] = lst[i];
}
}
Or for conciseness, may be just (which can be still faster than Guid approach)
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> lst)
{
Random rnd = new Random();
return lst.OrderBy(x => rnd.Next());
}

So how does this work?
Following query uses Comparer<Guid>.Default for comparison.
.OrderBy(x => Guid.NewGuid())
Since every generated GUID is practically unique (as you are generating in the OrderBy clause itself), you are believing that you are getting random order (which is incorrect understanding).
If you run the query again you will again see (presumably) shuffled result as new set of GUIDs will be generated.
If you will use predefined GUIDs, you will see order.
Example randomNumbers1 and randomNumbers2 have same values in below.
var randomGuids = Enumerable.Range(0,10).Select (x => Guid.NewGuid()).ToArray();
var randomNumbers1 = Enumerable.Range(0, 10).OrderBy(x => randomGuids[x]);
var randomNumbers2 = Enumerable.Range(0, 10).OrderBy(x => randomGuids[x]);
I really fail to see how any of this reasonable logic can be applied to shuffle the array in such a way.
You are able to shuffle because there is no order between elements (GUID in your example). If you use elements that have ordered, you will get ordered output instead of shuffled one.

Related

Linq get list of indexes matching a condition to filter another list against

I've got a list as a result of some pixel math like:
List<double> MList = new List<double>(new double[]{ 0.002, 0.123, 0.457, 0.237 ,0.1});
I would like to use Linq, to retrieve from that list, all indexes of items below a value, so if the value to compare against is 0.15 it sould result the folowing indexes :
0,1,4
List<double> MClose = new list<double>();
double compare = 0.15;
List<double> MClose = MList.Where(item => item < compare).Select((item,index) => index);
I hope so far so good, then i would like to use this gained index, to use against another list. That's a list made out of RGB values, to build a new list only out of values selected by that index.
class RGB{int r;int g; int b}
list<RGB>=colors = new RGB(){new RGB[,,]{{10,10,2},{13,11,2},{15,16,17},{33,13,2},{35,116,117}}};
I don't have used Linq a lot, and I wonder if this could be coded trough Linq, maybe even a one liner ?, i'm curious how small an answers could get.
And (would Linq be fast for pixel editing), i'm dealing width convulsion maps here usually 3x3 to 64x64 pixels of data.
List<double> MClose = MList.Where(item => item < compare).Select((item,index) => index);
First you've defined MClose to be a List<double> but your final .Select((item,index) => index) will return an IEnumerable<int> - which isn't a List but a collection that can be iterated over. Use var to automatically infer the type of MClose, and use .ToList() so that the result of the iteration is only evaluated once and brought into memory:
var MClose = MList.Where(item => item < compare).Select((item,index) => index).ToList();
Then you can use the .Where clause with indexes:
var filteredColors = colors.Where((c,index)=> MClose.Contains(index)).ToList();
Use .Contains() to filter only those indexes that you've got in MClose.
You need to reorder your linq methods. First call Select then When:
List<double> MList = new List<double>(new double[] { 0.002, 0.123, 0.457, 0.237, 0.1 });
double compare = 0.15;
var idx = MList.Select((x, i) => new {x, i})
.Where(x => x.x < compare)
.Select(x => x.i)
.ToArray();
Now in idx you will have [0, 1, 4]
Some explanations: after your are applying Where method, your indices will differ from originals. So first you need to save original index, then you may filter MList
I would like to use Linq, to retrieve from that list, all indexes of items below a value, so if the value to compare against is 0.15 it sould result the folowing indexes : 0,1,4
You can get an index of the element by using IndexOf()
List<double> list = new List<double> { 0.002, 0.123, 0.457, 0.237, 0.1 };
List<int> indexes = list
.Where(q => q < 0.15)
.Select(q => list.IndexOf(q))
.ToList();
I hope so far so good, then i would like to use this gained index, to
use against another list. That's a list made out of RGB values, to
build a new list only out of values selected by that index.
Does not make much sense for me.

Implicit Index or Generating Index in LINQ

I recently encounter a couple of cases where it makes me wonder if there is any way to get internal index, or if not, to generate index efficiently when using LINQ.
Consider the following case:
List<int> list = new List() { 7, 4, 5, 1, 7, 8 };
In C#, if we are to return the indexes of "7" on the above array using LINQ, we cannot simply using IndexOf/LastIndexOf - for it will only return the first/last result.
This will not work:
var result = list.Where(x => x == 7).Select(y => list.IndexOf(y));
We have several workarounds for doing it, however.
For instance, one way of doing it could be this (which I typically do):
int index = 0;
var result = from x in list
let indexNow = index++
where x == 7
select indexNow;
It will introduce additional parameter indexNow by using let.
And also another way:
var result = Enumerable.Range(0, list.Count).Where(x => list.ElementAt(x) == 7);
It will generate another set of items by using Enumerable.Range and pick the result from it.
Now, I am wondering if there is any alternative, simpler way of doing it, such as:
if there is any (built-in) way to get the internal index of the IEnumerable without declaring something with let or
to generate the index for the IEnumerable using something other than Enumerable.Range (perhaps something like new? Which I am not too familiar how to do it), or
anything else which could shorten the code but still getting the indexes of the IEnumerable.
From IEnumerable.Select with index, How to get index using LINQ?, and so on: IEnumerable<T>.Select() has an overload that provides an index.
Using this, you can:
Project into an anonymous type (or a Tuple<int, T> if you want to expose it to other methods, anonymous types are local) that contains the index and the value.
Filter those results for the value you're looking for.
Project only the index.
So the code will look like this:
var result = list.Select((v, i) => new { Index = i, Value = v })
.Where(i => i.Value == 7)
.Select(i => i.Index);
Now result will contain an enumerable containing all indexes.
Note that this will only work for source collection types that guarantee insertion order and provide a numeric indexer, such as List<T>. When you use this for a Dictionary<TKey, TValue> or HashSet<T>, the resulting indexes will not be usable to index into the dictionary.

Check array for duplicates, return only items which appear more than once

I have an text document of emails such as
Google12#gmail.com,
MyUSERNAME#me.com,
ME#you.com,
ratonabat#co.co,
iamcool#asd.com,
ratonabat#co.co,
I need to check said document for duplicates and create a unique array from that (so if "ratonabat#co.co" appears 500 times in the new array he'll only appear once.)
Edit:
For an example:
username1#hotmail.com
username2#hotmail.com
username1#hotmail.com
username1#hotmail.com
username1#hotmail.com
username1#hotmail.com
This is my "data" (either in an array or text document, I can handle that)
I want to be able to see if there's a duplicate in that, and move the duplicate ONCE to another array. So the output would be
username1#hotmail.com
You can simply use Linq's Distinct extension method:
var input = new string[] { ... };
var output = input.Distinct().ToArray();
You may also want to consider refactoring your code to use a HashSet<string> instead of a simple array, as it will gracefully handle duplicates.
To get an array containing only those records which are duplicates, it's a little moe complex, but you can still do it with a little Linq:
var output = input.GroupBy(x => x)
.Where(g => g.Skip(1).Any())
.Select(g => g.Key)
.ToArray();
Explanation:
.GroupBy group identical strings together
.Where filter the groups by the following criteria
.Skip(1).Any() return true if there are 2 or more items in the group. This is equivalent to .Count() > 1, but it's slightly more efficient because it stops counting after it finds a second item.
.Select return a set consisting only of a single string (rather than the group)
.ToArray convert the result set to an array.
Here's another solution using a custom extension method:
public static class MyExtensions
{
public static IEnumerable<T> Duplicates<T>(this IEnumerable<T> input)
{
var a = new HashSet<T>();
var b = new HashSet<T>();
foreach(var x in input)
{
if (!a.Add(x) && b.Add(x))
yield return x;
}
}
}
And then you can call this method like this:
var output = input.Duplicates().ToArray();
I haven't benchmarked this, but it should be more efficient than the previous method.
You can use the built in in .Distinct() method, by default the comparisons are case sensitive, if you want to make it case insenstive use the overload that takes a comparer in and use a case insensitive string comparer.
List<string> emailAddresses = GetListOfEmailAddresses();
string[] uniqueEmailAddresses = emailAddresses.Distinct(StringComparer.OrdinalIgnoreCase).ToArray();
EDIT: Now I see after you made your clarification you only want to list the duplicates.
string[] duplicateAddresses = emailAddresses.GroupBy(address => address,
(key, rows) => new {Key = key, Count = rows.Count()},
StringComparer.OrdinalIgnoreCase)
.Where(row => row.Count > 1)
.Select(row => row.Key)
.ToArray();
To select emails which occur more then once..
var dupEmails=from emails in File.ReadAllText(path).Split(',').GroupBy(x=>x)
where emails.Count()>1
select emails.Key;

Sort one list by another

I have 2 list objects, one is just a list of ints, the other is a list of objects but the objects has an ID property.
What i want to do is sort the list of objects by its ID in the same sort order as the list of ints.
Ive been playing around for a while now trying to get it working, so far no joy,
Here is what i have so far...
//**************************
//*** Randomize the list ***
//**************************
if (Session["SearchResultsOrder"] != null)
{
// save the session as a int list
List<int> IDList = new List<int>((List<int>)Session["SearchResultsOrder"]);
// the saved list session exists, make sure the list is orded by this
foreach(var i in IDList)
{
SearchData.ReturnedSearchedMembers.OrderBy(x => x.ID == i);
}
}
else
{
// before any sorts randomize the results - this mixes it up a bit as before it would order the results by member registration date
List<Member> RandomList = new List<Member>(SearchData.ReturnedSearchedMembers);
SearchData.ReturnedSearchedMembers = GloballyAvailableMethods.RandomizeGenericList<Member>(RandomList, RandomList.Count).ToList();
// save the order of these results so they can be restored back during postback
List<int> SearchResultsOrder = new List<int>();
SearchData.ReturnedSearchedMembers.ForEach(x => SearchResultsOrder.Add(x.ID));
Session["SearchResultsOrder"] = SearchResultsOrder;
}
The whole point of this is so when a user searches for members, initially they display in a random order, then if they click page 2, they remain in that order and the next 20 results display.
I have been reading about the ICompare i can use as a parameter in the Linq.OrderBy clause, but i can’t find any simple examples.
I’m hoping for an elegant, very simple LINQ style solution, well I can always hope.
Any help is most appreciated.
Another LINQ-approach:
var orderedByIDList = from i in ids
join o in objectsWithIDs
on i equals o.ID
select o;
One way of doing it:
List<int> order = ....;
List<Item> items = ....;
Dictionary<int,Item> d = items.ToDictionary(x => x.ID);
List<Item> ordered = order.Select(i => d[i]).ToList();
Not an answer to this exact question, but if you have two arrays, there is an overload of Array.Sort that takes the array to sort, and an array to use as the 'key'
https://msdn.microsoft.com/en-us/library/85y6y2d3.aspx
Array.Sort Method (Array, Array)
Sorts a pair of one-dimensional Array objects (one contains the keys
and the other contains the corresponding items) based on the keys in
the first Array using the IComparable implementation of each key.
Join is the best candidate if you want to match on the exact integer (if no match is found you get an empty sequence). If you want to merely get the sort order of the other list (and provided the number of elements in both lists are equal), you can use Zip.
var result = objects.Zip(ints, (o, i) => new { o, i})
.OrderBy(x => x.i)
.Select(x => x.o);
Pretty readable.
Here is an extension method which encapsulates Simon D.'s response for lists of any type.
public static IEnumerable<TResult> SortBy<TResult, TKey>(this IEnumerable<TResult> sortItems,
IEnumerable<TKey> sortKeys,
Func<TResult, TKey> matchFunc)
{
return sortKeys.Join(sortItems,
k => k,
matchFunc,
(k, i) => i);
}
Usage is something like:
var sorted = toSort.SortBy(sortKeys, i => i.Key);
One possible solution:
myList = myList.OrderBy(x => Ids.IndexOf(x.Id)).ToList();
Note: use this if you working with In-Memory lists, doesn't work for IQueryable type, as IQueryable does not contain a definition for IndexOf
docs = docs.OrderBy(d => docsIds.IndexOf(d.Id)).ToList();

Using LINQ, how do I choose items at particular indexes?

If I have an IEnumerable<Foo> allFoos and an IEnumerable<Int32> bestFooIndexes, how can I get a new IEnumerable<Foo> bestFoos containing the Foo entries from allFoos at the indexes specified by bestFooIndexes?
var bestFoos = bestFooIndexes.Select(index => allFoos.ElementAt(index));
If you're worried about performance and the collections are large engouh:
List<Foo> allFoosList = allFoos.ToList();
var bestFoos = bestFooIndexes.Select(index => allFoosList[index]);
Elisha's answer will certainly work, but it may be very inefficient... it depends on what allFoos is implemented by. If it's an implementation of IList<T>, ElementAt will be efficient - but if it's actually the result of (say) a LINQ to Objects query, then the query will be re-run for every index. So it may be more efficient to write:
var allFoosList = allFoos.ToList();
// Given that we *know* allFoosList is a list, we can just use the indexer
// rather than getting ElementAt to perform the optimization on each iteration
var bestFoos = bestFooIndexes.Select(index => allFoosList[index]);
You could to this only when required, of course:
IList<Foo> allFoosList = allFoos as IList<Foo> ?? allFoos.ToList();
var bestFoos = bestFooIndexes.Select(index => allFoosList[index]);
You could make an extension method like so:
public IEnumerable<T> ElementsAt(this IEnumerable<T> list, IEnumerable<int> indexes)
{
foreach(var index in indexes)
{
yield return list.ElementAt(index);
}
}
Then you could go something like this
var bestFoos = allFoos.ElementsAt(bestFooIndexes);
Another solution based on join:
var bestFoos = from entry in allFoos
.Select((a, i) = new {Index = i, Element = a})
join index in bestFooIndexed on entry.Index equals index
select entry.Element;
Jon Skeet's / Elisha's answer is the way to go.
Here's a slightly different solution, less efficient in all likelihood:
var bestFooIndices = new HashSet<int>(bestFooIndexes);
var bestFoos = allFoos.Where((foo, index) => bestFooIndices.Contains(index));
Repeats contained in bestFooIndexes will not produce duplicates in the result. Additionally, elements in the result will be ordered by their enumeration order in allFoos rather than by the order in which they are present in bestFooIndexes.
var bestFoosFromAllFoos = allFoos.Where((s) => bestFoos.Contains(s));

Categories