Using LINQ, how do I choose items at particular indexes? - c#

If I have an IEnumerable<Foo> allFoos and an IEnumerable<Int32> bestFooIndexes, how can I get a new IEnumerable<Foo> bestFoos containing the Foo entries from allFoos at the indexes specified by bestFooIndexes?

var bestFoos = bestFooIndexes.Select(index => allFoos.ElementAt(index));
If you're worried about performance and the collections are large engouh:
List<Foo> allFoosList = allFoos.ToList();
var bestFoos = bestFooIndexes.Select(index => allFoosList[index]);

Elisha's answer will certainly work, but it may be very inefficient... it depends on what allFoos is implemented by. If it's an implementation of IList<T>, ElementAt will be efficient - but if it's actually the result of (say) a LINQ to Objects query, then the query will be re-run for every index. So it may be more efficient to write:
var allFoosList = allFoos.ToList();
// Given that we *know* allFoosList is a list, we can just use the indexer
// rather than getting ElementAt to perform the optimization on each iteration
var bestFoos = bestFooIndexes.Select(index => allFoosList[index]);
You could to this only when required, of course:
IList<Foo> allFoosList = allFoos as IList<Foo> ?? allFoos.ToList();
var bestFoos = bestFooIndexes.Select(index => allFoosList[index]);

You could make an extension method like so:
public IEnumerable<T> ElementsAt(this IEnumerable<T> list, IEnumerable<int> indexes)
{
foreach(var index in indexes)
{
yield return list.ElementAt(index);
}
}
Then you could go something like this
var bestFoos = allFoos.ElementsAt(bestFooIndexes);

Another solution based on join:
var bestFoos = from entry in allFoos
.Select((a, i) = new {Index = i, Element = a})
join index in bestFooIndexed on entry.Index equals index
select entry.Element;

Jon Skeet's / Elisha's answer is the way to go.
Here's a slightly different solution, less efficient in all likelihood:
var bestFooIndices = new HashSet<int>(bestFooIndexes);
var bestFoos = allFoos.Where((foo, index) => bestFooIndices.Contains(index));
Repeats contained in bestFooIndexes will not produce duplicates in the result. Additionally, elements in the result will be ordered by their enumeration order in allFoos rather than by the order in which they are present in bestFooIndexes.

var bestFoosFromAllFoos = allFoos.Where((s) => bestFoos.Contains(s));

Related

LINQ yield all elements

When I want to get an IEnumerable to eagrly materialize/yield all its results I usually use ToList() like this:
var myList= new List<int>();
IEnumerable<int> myXs = myList.Select(item => item.x).ToList();
I do this usually when locking a method returning the result of a Linq query.
In these kind of cases I am not actually interested in the collection becoming a list and I often don't want to know it's type. I am just using ToList() for it's side effect - yielding all the elements.
If for example if I will change the type from List to Array I will also have to remember to change the ToList() to ToArray() or suffer some performance hit.
I can do foreach( var e in myList ) { } but I am not sure if this will be optimized at some point ?
I am looking for something like myList.Select(item => item.x).yield()
What is the best way to do it ? is there a way to simply tell an a Linq result to yield all its elements which is better than ToList ?
If the point is just to exercise the list, and don't want to construct or allocate an array of any kind, you can use Last(), which will simply iterate over all the elements until it gets to the last one (see source).
If you are actually interested in the results, in most cases you should simply use ToList() and don't overthink it.
There is no way to avoid allocating some sort of storage if you want to retrieve the results later. There is no magic IEnumerable<T> container that has no concrete type; you have to choose one, and ToList() is the most obvious choice with low overhead.
Don't forget ToListAsync() if you'd rather not wait for it to finish.
Just a FYI, since maybe that is the issue
You don't have to write LINQ Operations in a one-liner you can extend it further and further:
For example:
var myList = new List<T>();
var result = myList.Select(x => x.Foo).Where(x => x.City == "Vienna").Where(x => x.Big == true).ToList();
Could be re-written to:
var myList = new List<T>();
//get an IEnumerable<Foo>
var foos = myList.Select(x => x.Foo);
//get an IEnumerable<Foo> which is filtered by the City Vienna
var foosByCity = foos.Where(x => x.City == "Vienna");
//get an IEnumerable<Foo> which is futher filtered by Big == true
var foosByCityByBig = foosByCity.Where(x => x.Big == true);
//now you could call to list on the last IEnumerable, but you dont have to
var result = foosByCityByBig.ToList();
So what-ever your real-goal is, maybe you can change your line
var myList= new List<int>();
IEnumerable<int> myXs = myList.Select(item => item.x).ToList();
To this:
var myList= new List<int>();
IEnumerable<int> myXs = myList.Select(item => item.x);
And continue your work with myXs as an IEnumerable<int>.

Finding the list of common objects between two lists

I have list of objects of a class for example:
class MyClass
{
string id,
string name,
string lastname
}
so for example: List<MyClass> myClassList;
and also I have list of string of some ids, so for example:
List<string> myIdList;
Now I am looking for a way to have a method that accept these two as paramets and returns me a List<MyClass> of the objects that their id is the same as what we have in myIdList.
NOTE: Always the bigger list is myClassList and always myIdList is a smaller subset of that.
How can we find this intersection?
So you're looking to find all the elements in myClassList where myIdList contains the ID? That suggests:
var query = myClassList.Where(c => myIdList.Contains(c.id));
Note that if you could use a HashSet<string> instead of a List<string>, each Contains test will potentially be more efficient - certainly if your list of IDs grows large. (If the list of IDs is tiny, there may well be very little difference at all.)
It's important to consider the difference between a join and the above approach in the face of duplicate elements in either myClassList or myIdList. A join will yield every matching pair - the above will yield either 0 or 1 element per item in myClassList.
Which of those you want is up to you.
EDIT: If you're talking to a database, it would be best if you didn't use a List<T> for the entities in the first place - unless you need them for something else, it would be much more sensible to do the query in the database than fetching all the data and then performing the query locally.
That isn't strictly an intersection (unless the ids are unique), but you can simply use Contains, i.e.
var sublist = myClassList.Where(x => myIdList.Contains(x.id));
You will, however, get significantly better performance if you create a HashSet<T> first:
var hash = new HashSet<string>(myIdList);
var sublist = myClassList.Where(x => hash.Contains(x.id));
You can use a join between the two lists:
return myClassList.Join(
myIdList,
item => item.Id,
id => id,
(item, id) => item)
.ToList();
It is kind of intersection between two list so read it like i want something from one list that is present in second list. Here ToList() part executing the query simultaneouly.
var lst = myClassList.Where(x => myIdList.Contains(x.id)).ToList();
you have to use below mentioned code
var samedata=myClassList.where(p=>p.myIdList.Any(q=>q==p.id))
myClassList.Where(x => myIdList.Contains(x.id));
Try
List<MyClass> GetMatchingObjects(List<MyClass> classList, List<string> idList)
{
return classList.Where(myClass => idList.Any(x => myClass.id == x)).ToList();
}
var q = myClassList.Where(x => myIdList.Contains(x.id));

Check array for duplicates, return only items which appear more than once

I have an text document of emails such as
Google12#gmail.com,
MyUSERNAME#me.com,
ME#you.com,
ratonabat#co.co,
iamcool#asd.com,
ratonabat#co.co,
I need to check said document for duplicates and create a unique array from that (so if "ratonabat#co.co" appears 500 times in the new array he'll only appear once.)
Edit:
For an example:
username1#hotmail.com
username2#hotmail.com
username1#hotmail.com
username1#hotmail.com
username1#hotmail.com
username1#hotmail.com
This is my "data" (either in an array or text document, I can handle that)
I want to be able to see if there's a duplicate in that, and move the duplicate ONCE to another array. So the output would be
username1#hotmail.com
You can simply use Linq's Distinct extension method:
var input = new string[] { ... };
var output = input.Distinct().ToArray();
You may also want to consider refactoring your code to use a HashSet<string> instead of a simple array, as it will gracefully handle duplicates.
To get an array containing only those records which are duplicates, it's a little moe complex, but you can still do it with a little Linq:
var output = input.GroupBy(x => x)
.Where(g => g.Skip(1).Any())
.Select(g => g.Key)
.ToArray();
Explanation:
.GroupBy group identical strings together
.Where filter the groups by the following criteria
.Skip(1).Any() return true if there are 2 or more items in the group. This is equivalent to .Count() > 1, but it's slightly more efficient because it stops counting after it finds a second item.
.Select return a set consisting only of a single string (rather than the group)
.ToArray convert the result set to an array.
Here's another solution using a custom extension method:
public static class MyExtensions
{
public static IEnumerable<T> Duplicates<T>(this IEnumerable<T> input)
{
var a = new HashSet<T>();
var b = new HashSet<T>();
foreach(var x in input)
{
if (!a.Add(x) && b.Add(x))
yield return x;
}
}
}
And then you can call this method like this:
var output = input.Duplicates().ToArray();
I haven't benchmarked this, but it should be more efficient than the previous method.
You can use the built in in .Distinct() method, by default the comparisons are case sensitive, if you want to make it case insenstive use the overload that takes a comparer in and use a case insensitive string comparer.
List<string> emailAddresses = GetListOfEmailAddresses();
string[] uniqueEmailAddresses = emailAddresses.Distinct(StringComparer.OrdinalIgnoreCase).ToArray();
EDIT: Now I see after you made your clarification you only want to list the duplicates.
string[] duplicateAddresses = emailAddresses.GroupBy(address => address,
(key, rows) => new {Key = key, Count = rows.Count()},
StringComparer.OrdinalIgnoreCase)
.Where(row => row.Count > 1)
.Select(row => row.Key)
.ToArray();
To select emails which occur more then once..
var dupEmails=from emails in File.ReadAllText(path).Split(',').GroupBy(x=>x)
where emails.Count()>1
select emails.Key;

How does Enumerable.OrderBy use keySelector

Using the proceeding code, I successfully managed to produce a collection of numbers and shuffle the elements' position in the array:
var randomNumbers = Enumerable.Range(0, 100)
.OrderBy(x => Guid.NewGuid());
Everything functions fine but it's kind of thrown a spanner in my works when trying to understand Enumerable.OrderBy. Take the following code for example:
var pupils = new[]
{
new Person() { Name = "Alex", Age = 17 },
new Person() { Name = "Jack", Age = 21 }
};
var query = pupils.OrderBy(x => x.Age);
It's my understanding that I am passing the property I wish to sort by and I presume that LINQ will use Comparer<T>.Default to determine how to order the collection if no explicit IComparer is specified for the second overload. I really fail to see how any of this reasonable logic can be applied to shuffle the array in such a way. So how does LINQ let me shuffle an array like this?
How does Enumerable.OrderBy use keySelector?
Enumerable.OrderBy<T> lazily returns - keySelector is not called directly. The result is an IOrderedEnumerable<T> that will perform the ordering when enumerated.
When enumerated, keySelector is called once for each element. The order of the keys defines the new order of the elements.
Here's a nifty sample implementation.
So how does LINQ let me shuffle an array like this?
var randomNumbers = Enumerable
.Range(0, 100)
.OrderBy(x => Guid.NewGuid());
Guid.NewGuid is called for each element. The call for the second element may generate a value higher or lower than the call for first element.
randomNumbers is an IOrderedEnumerable<int> that produces a different order each time it is enumerated. KeySelector is called once per element each time randomNumbers is enumerated.
You're pretty close to understanding how such shuffling works.. In your second case
pupils.OrderBy(x => x.Age);
the Comparer<int>.Default is used (the persons are sorted by their Age, simple).
In your first case, Comparer<Guid>.Default is used.
Now how does that work?.
Every time you do Guid.NewGuid() (presumably) a different/original/non duplicated Guid is produced. Now when you do
var randomNumbers = Enumerable.Range(0, 100).OrderBy(x => Guid.NewGuid());
the numbers are sorted on the basis of the generated Guids.
Now what are guids?
They are 128 bit integers represented in hexadecimal form. Since 2^128 is such a large number the chances of generating two Guids are very rare/almost impossible. Since Guids exhibit some sort of randomness, the ordering will be random too.
How are two Guids compared to enforce the ordering?
You can confirm it based on a trivial experiment. Do:
var guids = Enumerable.Range(0, 10).Select((x, i) =>
{
Guid guid = Guid.NewGuid();
return new { Guid = guid, NumberRepresentation = new BigInteger(guid.ToByteArray()), OriginalIndex = i };
}).ToArray();
var guidsOrderedByTheirNumberRepresentation = guids.OrderBy(x => x.NumberRepresentation).ToArray();
var guidsOrderedAsString = guids.OrderBy(x => x.Guid.ToString()).ToArray();
var randomNumbers = Enumerable.Range(0, 10).OrderBy(x => guids[x].Guid).ToArray();
//print randomNumbers.SequenceEqual(guidsOrderedByTheirNumberRepresentation.Select(x => x.OriginalIndex)) => false
//print randomNumbers.SequenceEqual(guidsOrderedAsString.Select(x => x.OriginalIndex)) => true
So Comparer<Guid>.Default is based on the string representation of the guid.
Aside:
You should use Fisher-Yates shuffling for speed. May be
public static IEnumerable<T> Shuffle<T>(this IList<T> lst)
{
Random rnd = new Random();
for (int i = lst.Count - 1; i >= 0; i--)
{
int j = rnd.Next(i + 1);
yield return lst[j];
lst[j] = lst[i];
}
}
Or for conciseness, may be just (which can be still faster than Guid approach)
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> lst)
{
Random rnd = new Random();
return lst.OrderBy(x => rnd.Next());
}
So how does this work?
Following query uses Comparer<Guid>.Default for comparison.
.OrderBy(x => Guid.NewGuid())
Since every generated GUID is practically unique (as you are generating in the OrderBy clause itself), you are believing that you are getting random order (which is incorrect understanding).
If you run the query again you will again see (presumably) shuffled result as new set of GUIDs will be generated.
If you will use predefined GUIDs, you will see order.
Example randomNumbers1 and randomNumbers2 have same values in below.
var randomGuids = Enumerable.Range(0,10).Select (x => Guid.NewGuid()).ToArray();
var randomNumbers1 = Enumerable.Range(0, 10).OrderBy(x => randomGuids[x]);
var randomNumbers2 = Enumerable.Range(0, 10).OrderBy(x => randomGuids[x]);
I really fail to see how any of this reasonable logic can be applied to shuffle the array in such a way.
You are able to shuffle because there is no order between elements (GUID in your example). If you use elements that have ordered, you will get ordered output instead of shuffled one.

Sort one list by another

I have 2 list objects, one is just a list of ints, the other is a list of objects but the objects has an ID property.
What i want to do is sort the list of objects by its ID in the same sort order as the list of ints.
Ive been playing around for a while now trying to get it working, so far no joy,
Here is what i have so far...
//**************************
//*** Randomize the list ***
//**************************
if (Session["SearchResultsOrder"] != null)
{
// save the session as a int list
List<int> IDList = new List<int>((List<int>)Session["SearchResultsOrder"]);
// the saved list session exists, make sure the list is orded by this
foreach(var i in IDList)
{
SearchData.ReturnedSearchedMembers.OrderBy(x => x.ID == i);
}
}
else
{
// before any sorts randomize the results - this mixes it up a bit as before it would order the results by member registration date
List<Member> RandomList = new List<Member>(SearchData.ReturnedSearchedMembers);
SearchData.ReturnedSearchedMembers = GloballyAvailableMethods.RandomizeGenericList<Member>(RandomList, RandomList.Count).ToList();
// save the order of these results so they can be restored back during postback
List<int> SearchResultsOrder = new List<int>();
SearchData.ReturnedSearchedMembers.ForEach(x => SearchResultsOrder.Add(x.ID));
Session["SearchResultsOrder"] = SearchResultsOrder;
}
The whole point of this is so when a user searches for members, initially they display in a random order, then if they click page 2, they remain in that order and the next 20 results display.
I have been reading about the ICompare i can use as a parameter in the Linq.OrderBy clause, but i can’t find any simple examples.
I’m hoping for an elegant, very simple LINQ style solution, well I can always hope.
Any help is most appreciated.
Another LINQ-approach:
var orderedByIDList = from i in ids
join o in objectsWithIDs
on i equals o.ID
select o;
One way of doing it:
List<int> order = ....;
List<Item> items = ....;
Dictionary<int,Item> d = items.ToDictionary(x => x.ID);
List<Item> ordered = order.Select(i => d[i]).ToList();
Not an answer to this exact question, but if you have two arrays, there is an overload of Array.Sort that takes the array to sort, and an array to use as the 'key'
https://msdn.microsoft.com/en-us/library/85y6y2d3.aspx
Array.Sort Method (Array, Array)
Sorts a pair of one-dimensional Array objects (one contains the keys
and the other contains the corresponding items) based on the keys in
the first Array using the IComparable implementation of each key.
Join is the best candidate if you want to match on the exact integer (if no match is found you get an empty sequence). If you want to merely get the sort order of the other list (and provided the number of elements in both lists are equal), you can use Zip.
var result = objects.Zip(ints, (o, i) => new { o, i})
.OrderBy(x => x.i)
.Select(x => x.o);
Pretty readable.
Here is an extension method which encapsulates Simon D.'s response for lists of any type.
public static IEnumerable<TResult> SortBy<TResult, TKey>(this IEnumerable<TResult> sortItems,
IEnumerable<TKey> sortKeys,
Func<TResult, TKey> matchFunc)
{
return sortKeys.Join(sortItems,
k => k,
matchFunc,
(k, i) => i);
}
Usage is something like:
var sorted = toSort.SortBy(sortKeys, i => i.Key);
One possible solution:
myList = myList.OrderBy(x => Ids.IndexOf(x.Id)).ToList();
Note: use this if you working with In-Memory lists, doesn't work for IQueryable type, as IQueryable does not contain a definition for IndexOf
docs = docs.OrderBy(d => docsIds.IndexOf(d.Id)).ToList();

Categories