How to custom sort a list of strings given a desired sorting hierarchy? - c#

I would like to sort a List<string> in a particular way. Below is a unit test showing the input, the specific way (which I am calling a "hierarchy" - feel free to correct my terminology so that I may learn), and the desired output. The code should be self explanatory.
[Test]
public void CustomSortByHierarchy()
{
List<string> input = new List<string>{"TJ", "DJ", "HR", "HR", "TJ"};
List<string> hierarchy = new List<string>{"HR", "TJ", "DJ" };
List<string> sorted = input.Sort(hierarchy); // <-- does not compile. How do I sort by the hierarchy?
// ...and if the sort worked as desired, these assert statements would return true:
Assert.AreEqual("HR", sorted[0]);
Assert.AreEqual("HR", sorted[1]);
Assert.AreEqual("TJ", sorted[2]);
Assert.AreEqual("TJ", sorted[3]);
Assert.AreEqual("DJ", sorted[4]);
}

Another way to do it:
var hierarchy = new Dictionary<string, int>{
{ "HR", 1},
{ "TJ", 2},
{ "DJ", 3} };
var sorted = strings.OrderBy(s => hierarchy[s]).ToList();

There are so many ways to do this.
It's not great to create a static dictionary - especially when you have a static list of the values already in the order that you want (i.e. List<string> hierarchy = new List<string>{"HR", "TJ", "DJ" };). The problem with a static dictionary is that it is static - to change it you must recompile your program - and also it's prone to errors - you might mistype a number. It's best to dynamically create the dictionary. That way you can adjust your hierarchy at run-time and use it to order your input.
Here's the basic way to create the dictionary:
Dictionary<string, int> indices =
hierarchy
.Select((value, index) => new { value, index })
.ToDictionary(x => x.value, x => x.index);
Then it's an easy sort:
List<string> sorted = input.OrderBy(x => indices[x]).ToList();
However, if you have a missing value in the hierarchy then this will blow up with a KeyNotFoundException exception.
Try with this input:
List<string> input = new List<string> { "TJ", "DJ", "HR", "HR", "TJ", "XX" };
You need to decide if you are removing missing items from the list or concatenating them at the end of the list.
To remove you'd do this:
List<string> sorted =
input
.Where(x => indices.ContainsKey(x))
.OrderBy(x => indices[x])
.ToList();
Or to sort to the end you'd do this:
List<string> sorted =
input
.OrderBy(x => indices.ContainsKey(x) ? indices[x] : int.MaxValue)
.ThenBy(x => x) // groups missing items together and is optional
.ToList();
If you simply want to remove items from input that aren't in hierarchy then there are a couple of other options that might be appealing.
Try this:
List<string> sorted =
(
from x in input
join y in hierarchy.Select((value, index) => new { value, index })
on x equals y.value
orderby y.index
select x
).ToList();
Or this:
ILookup<string, string> lookup = input.ToLookup(x => x);
List<string> sorted = hierarchy.SelectMany(x => lookup[x]).ToList();
Personally, I like this last one. It's a two liner and it doesn't rely on indices at all.

Related

How to compare list anonymous type to another list

I am trying to use Except to compare two list but one is an anonymous type.
For Example:
var list1 = customer.Select(x => new { x.ID, x.Name }).ToList();
var list2 = existcustomer.Select(x => x.ID).ToList();
And I trying to compare two list IDs and return a list of list one name.
My code:
var checkIfCustomerIdNotExist = list1.Where(x => x.ID.ToList().Except(list2)).Select(x => x.Name).ToList();
I am wondering if there is any workaround.
I'd advise using a dictionary instead of a List
//create a dictionary of ID to name
var customerDict = customer.ToDictionary(x => x.ID, x => x.Name);
//get your list to compare to
var list2 = existcustomer.Select(x => x.ID).ToList();
//get all entries where the id is not in the second list, and then grab the names
var newNames = customerDict.Where(x => !list2.Contains(x.Key)).Select(x => x.Value).ToList();
In general it's good to think about what data structure suits your data the best. In your case list1 contains a list of tuple where ID needs to be used to find Name - using a data structure more suited to that (A dictionary) makes solving your problem much easier and cleaner
Note, the solution assumes ID is unique, if there are duplicate IDs with different names you might need a Dictionary<string, List< string >> instead of just a Dictionary<string,string> - but that would probably require more than just a single line of linq, code would be fairly similar though

Highhest Number's key in a GroupBy

I have a simple class:
class Balls
{
public int BallType;
}
And i have a really simple list:
var balls = new List<Balls>()
{
new Balls() { BallType = 1},
new Balls() { BallType = 1},
new Balls() { BallType = 1},
new Balls() { BallType = 2}
};
I've used GroupBy on this list and I want to get back the key which has the highest count/amount:
After I used x.GroupBy(q => q.BallType) I tried to use .Max(), but it returns 3 and I need the key which is 1.
I also tried to use Console.WriteLine(x.GroupBy(q => q.Balltype).Max().Key); but it throws System.ArgumentException.
Here's what I came up with:
var mostCommonBallType = balls
.GroupBy(k => k.BallType)
.OrderBy(g => g.Count())
.Last().Key
You group by the BallType, order by the count of items in the group, get the last value (since order by is in an ascending order, the most common value would be the last) and then return it's key
Some came up with the idea to order the sequence:
var mostCommonBallType = balls
.GroupBy(k => k.BallType)
.OrderBy(g => g.Count())
.Last().Key
Apart from that it is more efficient to OrderByDescending and then take the FirstOrDefault, you also get in trouble if your collection of Balls is empty.
If you use a different overload of GroupBy, you won't have these problems
var mostCommonBallType = balls.GroupBy(
// KeySelector:
k => k.BallType,
// ResultSelector:
(ballType, ballsWithThisBallType) => new
{
BallType = ballType,
Count = ballsWithThisBallType.Count(),
})
.OrderByDescending(group => group.Count)
.Select(group => group.BallType)
.FirstOrDefault();
This solves the previously mentioned problems. However, if you only need the 1st element, why would you order the 2nd and the 3rd element? Using Aggregate instead of OrderByDescending will enumerate only once:
Assuming your collection is not empty:
var result = ... GroupBy(...)
.Aggregate( (groupWithHighestBallCount, nextGroup) =>
(groupWithHighestBallCount.Count >= nextGroup.Count) ?
groupWithHighestBallCount : nextGroup)
.Select(...).FirstOrDefault();
Aggregate takes the first element of your non-empty sequence, and assigns it to groupWithHighestBallCount. Then it iterates over the rest of the sequence, and compare this nextGroup.Count with the groupWithHighestBallCount.Count. It keeps the one with the hightes value as the next groupWithHighestBallCount. The return value is the final groupWithHighestBallCount.
See that Aggregate only enumerates once?

LINQ returns List<{int,double}> two values after .selected but I need List<int> with one value only

I have got this assignment. I need to create method which works with JSON data in this form:
On input N, what is top N of movies? The score of a movie is its average rate
So I have a JSONfile with 5 mil. movies inside. Each row looks like this:
{ Reviewer:1, Movie:1535440, Grade:1, Date:'2005-08-18'},
{ Reviewer:1, Movie:1666666, Grade:2, Date:'2006-09-20'},
{ Reviewer:2, Movie:1535440, Grade:3, Date:'2008-05-10'},
{ Reviewer:3, Movie:1535440, Grade:5, Date:'2008-05-11'},
This file is deserialized and then saved as a IEnumerable. And then I wanted to create a method, which returns List<int> where int is MovieId. Movies in the list are ordered descending and the amount of "top" movies is specified as a parameter of the method.
My method looks like this:
public List<int> GetSpecificAmountOfBestMovies(int amountOfMovies)
{
var moviesAndAverageGradeSortedList = _deserializator.RatingCollection()
.GroupBy(movieId => movieId.Movie)
.Select(group => new
{
Key = group.Key,
Average = group.Average(g => g.Grade)
})
.OrderByDescending(a => a.Average)
.Take(amountOfMovies)
.ToList();
var moviesSortedList = new List<int>();
foreach (var movie in moviesAndAverageGradeSortedList)
{
var key = movie.Key;
moviesSortedList.Add(key);
}
return moviesSortedList;
}
So moviesAndAverageGradeSortedList returns List<{int,double}> because of the .select method. So I could not return this value as this method is type of List<int> because I want only movieIds not their average grades.
So I created a new List<int> and then foreach loop which go through the moviesAndAverageGradeSortedList and saves only Keys from that List.
I think this solution is not correct because foreach loop can be then very slow when I put big number as a parameter. Does somebody know, how can I get "Keys" (movieIds) from the first list and therefore avoid creating another List<int> and foreach loop?
I will be thankful for every solution.
You can avoid the second list creation by just adding another .Select after the ordering. Also to make it all a bit cleaner you could:
return _deserializator.RatingCollection()
.GroupBy(i => i.Movie)
.OrderByDescending(g => g.Average(i => i.Grade))
.Select(g => g.Key)
.Take(amountOfMovies)
.ToList();
Note that this won't really improve performance much (if at all) because even in your original implementation the creation of the second list is done only on the subset of the first n items. The expensive operations are the ordering by the averages of the group and that you want to perform on all items in the json file, regardless to the number of item you want to return
You could add another select after you have ordered the list by average
var moviesAndAverageGradeSortedList = _deserializator.RatingCollection()
.GroupBy(movieId => movieId.Movie)
.Select(group => new
{
Key = group.Key,
Average = group.Average(g => g.Grade)
})
.OrderByDescending(a => a.Average)
.Take(amountOfMovies)
.Select(s=> s.Key)
.ToList();

order objects by given values

Given:
class C
{
public string Field1;
public string Field2;
}
template = new [] { "str1", "str2", ... }.ToList() // presents allowed values for C.Field1 as well as order
list = new List<C> { ob1, ob2, ... }
Question:
How can I perform Linq's
list.OrderBy(x => x.Field1)
which will use template above for order (so objects with Field1 == "str1" come first, than objects with "str2" and so on)?
In LINQ to Object, use Array.IndexOf:
var ordered = list
.Select(x => new { Obj = x, Index = Array.IndexOf(template, x.Field1)})
.OrderBy(p => p.Index < 0 ? 1 : 0) // Items with missing text go to the end
.ThenBy(p => p.Index) // The actual ordering happens here
.Select(p => p.Obj); // Drop the index from the result
This wouldn't work in EF or LINQ to SQL, so you would need to bring objects into memory for sorting.
Note: The above assumes that the list is not exhaustive. If it is, a simpler query would be sufficient:
var ordered = list.OrderBy(x => Array.IndexOf(template, x.Field1));
I think IndexOf might work here:
list.OrderBy(_ => Array.IndexOf(template, _.Field1))
Please note that it will return -1 when object is not present at all, which means it will come first. You'll have to handle this case. If your field is guaranteed to be there, it's fine.
As others have said, Array.IndexOf should do the job just fine. However, if template is long and or list is long, it might be worthwhile transforming your template into a dictionary. Something like:
var templateDict = template.Select((item,idx) => new { item, idx })
.ToDictionary(k => k.item, v => v.idx);
(or you could just start by creating a dictionary instead of an array in the first place - it's more flexible when you need to reorder stuff)
This will give you a dictionary keyed off the string from template with the index in the original array as your value. Then you can sort like this:
var ordered = list.OrderBy(x => templateDict[x.Field1]);
Which, since lookups in a dictionary are O(1) will scale better as template and list grow.
Note: The above code assumes all values of Field1 are present in template. If they are not, you would have to handle the case where x.Field1 isn't in templateDict.
var orderedList = list.OrderBy(d => Array.IndexOf(template, d.MachingColumnFromTempalate) < 0 ? int.MaxValue : Array.IndexOf(template, d.MachingColumnFromTempalate)).ToList();
I've actually written a method to do this before. Here's the source:
public static IOrderedEnumerable<T> OrderToMatch<T, TKey>(this IEnumerable<T> source, Func<T, TKey> sortKeySelector, IEnumerable<TKey> ordering)
{
var orderLookup = ordering
.Select((x, i) => new { key = x, index = i })
.ToDictionary(k => k.key, v => v.index);
if (!orderLookup.Any())
{
throw new ArgumentException("Ordering collection cannot be empty.", nameof(ordering));
}
T[] sourceArray = source.ToArray();
return sourceArray
.OrderBy(x =>
{
int index;
if (orderLookup.TryGetValue(sortKeySelector(x), out index))
{
return index;
}
return Int32.MaxValue;
})
.ThenBy(x => Array.IndexOf(sourceArray, x));
}
You can use it like this:
var ordered = list.OrderToMatch(x => x.Field1, template);
If you want to see the source, the unit tests, or the library it lives in, you can find it on GitHub. It's also available as a NuGet package.

best way to receive a list of objects from list of KeyValuePair?

I have a list of KeyValuePairs and I want to filter it based on the key value so eventually I will get a list of values which is filtered (meaning - will not contain all the values that were in the original list).
I guess maybe the best way is some form of Lambda expression but I am not sure how to achieve it.
Thanks,
Alon
Try this:
var values = list.Where(x => x.Key == "whatever").Select(x => x.Value);
This will give you a filtered list of the values only.
Obviously you can change the way you filter your keys.
Use the following:
var filteredList = list.Where(x => x.Key == "Key");
What you're looking for some combination of LINQ extension methods (which depends on what you're exactly trying to do).
For example if I had a List of fruits to their colors and wanted to get a collection of which fruits are red, I would do something like:
var fruits = new List<KeyValuePair<string,string>>() {
new KeyValuePair<string,string>("Apple", "Green"),
new KeyValuePair<string,string>("Orange", "Orange"),
new KeyValuePair<string,string>("Strawberry", "Red"),
new KeyValuePair<string,string>("Cherry", "Red")
};
var redFruits = fruits.Where(kvp => kvp.Value == "Red").Select(kvp => kvp.Key);
// this would result in a IEnumberable<string> { "Strawberry", "Cherry" }

Categories