Query Nested Dictionary - c#

I was curious if anyone had a good way to solving this problem efficiently. I currently have the following object.
Dictionary<int, Dictionary<double, CustomStruct>>
struct CustomStruct
{
double value1;
double value2;
...
}
Given that I know the 'int' I want to access, I need to know how to return the 'double key' for the dictionary that has the lowest sum of (value1 + value2). Any help would be greatly appreciated. I was trying to use Linq, but any method would be appreciated.

var result = dict[someInt].MinBy(kvp => kvp.Value.value1 + kvp.Value.value2).Key;
using the MinBy Extension Method from the awesome MoreLINQ project.

Using just plain LINQ:
Dictionary<int, Dictionary<double, CustomStruct>> dict = ...;
int id = ...;
var minimum =
(from kvp in dict[id]
// group the keys (double) by their sums
group kvp.Key by kvp.Value.value1 + kvp.Value.value2 into g
orderby g.Key // sort group keys (sums) in ascending order
select g.First()) // select the first key (double) in the group
.First(); // return first key in the sorted collection of keys
Whenever you want to get the minimum or maximum item using plain LINQ, you usually have to do it using ith a combination of GroupBy(), OrderBy() and First()/Last() to get it.

A Dictionary<TKey,TValue> is also a sequence of KeyValuePair<TKey,TValue>. You can select the KeyValuePair with the least sum of values and and get its key.
Using pure LINQ to Objects:
dict[someInt].OrderBy(item => item.Value.value1 + item.Value.value2)
.FirstOrDefault()
.Select(item => item.Key);

Here is the non LINQ way. It is not shorter than its LINQ counterparts but it is much more efficient because it does no sorting like most LINQ solutions which may turn out expensive if the collection is large.
The MinBy solution from dtb is a good one but it requires an external library. I do like LINQ a lot but sometimes you should remind yourself that a foreach loop with a few local variables is not archaic or an error.
CustomStruct Min(Dictionary<double, CustomStruct> input)
{
CustomStruct lret = default(CustomStruct);
double lastSum = double.MaxValue;
foreach (var kvp in input)
{
var other = kvp.Value;
var newSum = other.value1 + other.value2;
if (newSum < lastSum)
{
lastSum = newSum;
lret = other;
}
}
return lret;
}
If you want to use the LINQ method without using an extern library you can create your own MinBy like this one:
public static class Extensions
{
public static T MinBy<T>(this IEnumerable<T> coll, Func<T,double> criteria)
{
T lret = default(T);
double last = double.MaxValue;
foreach (var v in coll)
{
var newLast = criteria(v);
if (newLast < last)
{
last = newLast;
lret = v;
}
}
return lret;
}
}
It is not as efficient as the first one but it does the job and is more reusable and composable as the first one. Your solution with Aggregate is innovative but requires recalculation of the sum of the current best match for every item the current best match is compared to because you carry not enough state between the aggregate calls.

Thanks for all the help guys, found out this way too:
dict[int].Aggregate(
(seed, o) =>
{
var v = seed.Value.TotalCut + seed.Value.TotalFill;
var k = o.Value.TotalCut + o.Value.TotalFill;
return v < k ? seed : o;
}).Key;

Related

Find the first item in a list which gives a value, and return the value

How could the following iterative algorithm be rewritten using Linq?
int GetMatchingValue(List<Thing> things)
{
foreach(Thing t in things)
{
var value = thing.ComplicatedCalculation();
if(value > 0)
return value;
}
}
It's easy to find which Thing we want with LINQ:
var thing = things.FirstOrDefault(t => t.ComplicatedCalculation() > 0);
But then you have to do the check again to see which value to return:
return thing?.ComplicatedCalculation() ?? 0;
Is there a way to return the calculated value using LINQ without having to do it twice? Or is this a case where iterating the list is the simplest/cleanest solution? I would welcome a solution employing MoreLINQ also.
Use LINQ to first transform to the value you care about and find the one you want to return:
int GetMatchingValue(List<Thing> things)
=> things.Select(t => t.ComplicatedCalculation()).FirstOrDefault(v => v > 0);

C# distinct List<string> by substring

I want to remove duplicates from a list of strings. I do this by using distinct, but i want to ignore the first char when comparing.
I already have a working code that deletes the duplicates, but my code also delete the first char of every string.
List<string> mylist = new List<string>();
List<string> newlist =
mylist.Select(e => e.Substring(1, e.Length - 1)).Distinct().ToList();
Input:
"1A","1B","2A","3C","4D"
Output:
"A","B","C","D"
Right Output:
"1A","2B","3C","4D" it doesn't matter if "1A" or "2A" will be deleted
I guess I am pretty close but.... any input is highly appreciated!
As always a solution should work as fast as possible ;)
You can implement an IEqualityComparer<string> that will compare your strings by ignoring the first letter. Then pass it to Distinct method.
myList.Distinct(new MyComparer());
There is also an example on MSDN that shows you how to implement and use a custom comparer with Distinct.
You can GroupBy all but the first character and take the first of every group:
List<string> result= mylist.GroupBy(s => s.Length < 2 ? s : s.Substring(1))
.Select(g => g.First())
.ToList();
Result:
Console.Write(string.Join(",", result)); // 1A,1B,3C,4D
it doesn't matter if "1A" or "2A" will be deleted
If you change your mind you have to replace g.First() with the new logic.
However, if performance really matters and it is never important which duplicate you want to delete you should prefer Selman's approach which suggests to write a custom IEqualityComparer<string>. That will be more efficient than my GroupBy approach if it's GetHashCode is implemented like:
return (s.Length < 2 ? s : s.Substring(1)).GetHashCode();
I'm going to suggest a simple extension that you can reuse in similar situations
public static IEnumerable<T> DistinctBy<T, U>(this IEnumerable<T> This, Func<T, U> keySelector)
{
var set = new HashSet<U>();
foreach (var item in This)
{
if (set.Add(keySelector(item)))
yield return item;
}
}
This is basically how Distinct is implemented in Linq.
Usage:
List<string> newlist =
mylist.DistinctBy(e => e.Substring(1, e.Length - 1)).ToList();
I realise the answer has already been given, but since I was working on this answer anyway I'm still going to post it, in case it's any use.
If you really want the fastest solution for large lists, then something like this might be optimal. You would need to do some accurate timings to be sure, though!
This approach does not make any additional string copies when comparing or computing the hash codes:
using System;
using System.Collections.Generic;
using System.Linq;
namespace Demo
{
internal static class Program
{
static void Main()
{
var myList = new List<string>
{
"1A",
"1B",
"2A",
"3C",
"4D"
};
var newList = myList.Distinct(new MyComparer());
Console.WriteLine(string.Join("\n", newList));
}
sealed class MyComparer: IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
if (x.Length != y.Length)
return false;
if (x.Length == 0)
return true;
return (string.Compare(x, 1, y, 1, x.Length) == 0);
}
public int GetHashCode(string s)
{
if (s.Length <= 1)
return 0;
int result = 17;
unchecked
{
bool first = true;
foreach (char c in s)
{
if (first)
first = false;
else
result = result*23 + c;
}
}
return result;
}
}
}
}

Count numbers in a List

In C# i have a List which contains numbers in string format. Which is the best way to count all this numbers? For example to say i have three time the number ten..
I mean in unix awk you can say something like
tempArray["5"] +=1
it is similar to a KeyValuePair but it is readonly.
Any fast and smart way?
Very easy with LINQ :
var occurrenciesByNumber = list.GroupBy(x => x)
.ToDictionary(x => x.Key, x.Count());
Of course, being your numbers represented as strings, this code does distinguish for instance between "001" and "1" even if conceptually are the same number.
To count numbers that have the same value, you could do for example:
var occurrenciesByNumber = list.GroupBy(x => int.Parse(x))
.ToDictionary(x => x.Key, x.Count());
(As noted in digEmAll's answer, I'm assuming you don't really care that they're numbers - everything here assumes that you wanted to treat them as strings.)
The simplest way to do this is to use LINQ:
var dictionary = values.GroupBy(x => x)
.ToDictionary(group => group.Key, group => group.Count());
You could build the dictionary yourself, like this:
var map = new Dictionary<string, int>();
foreach (string number in list)
{
int count;
// You'd normally want to check the return value, but in this case you
// don't care.
map.TryGetValue(number, out count);
map[number] = count + 1;
}
... but I prefer the conciseness of the LINQ approach :) It will be a bit less efficient, mind you - if that's a problem, I'd personally probably create a generic "counting" extension method:
public static Dictionary<T, int> GroupCount<T>(this IEnumerable<T> source)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
var map = new Dictionary<T, int>();
foreach (T value in source)
{
int count;
map.TryGetValue(number, out count);
map[number] = count + 1;
}
return map;
}
(You might want another overload accepting an IEqualityComparer<T>.) Having written this once, you can reuse it any time you need to get the counts for items:
var counts = list.GroupCount();

Searching with Linq

I have a collection of objects, each with an int Frame property. Given an int, I want to find the object in the collection that has the closest Frame.
Here is what I'm doing so far:
public static void Search(int frameNumber)
{
var differences = (from rec in _records
select new { FrameDiff = Math.Abs(rec.Frame - frameNumber), Record = rec }).OrderBy(x => x.FrameDiff);
var closestRecord = differences.FirstOrDefault().Record;
//continue work...
}
This is great and everything, except there are 200,000 items in my collection and I call this method very frequently. Is there a relatively easy, more efficient way to do this?
var closestRecord = _records.MinBy(rec => Math.Abs(rec.Frame - frameNumber));
using MinBy from MoreLINQ.
What you might want to try is to store the frames in a datastructure that's sorted by Frame. Then you can do a binary search when you need to find the closest one to a given frameNumber.
I don't know that I would use LINQ for this, at least not with an orderby.
static Record FindClosestRecord(IEnumerable<Record> records, int number)
{
Record closest = null;
int leastDifference = int.MaxValue;
foreach (Record record in records)
{
int difference = Math.Abs(number - record.Frame);
if (difference == 0)
{
return record; // exact match, return early
}
else if (difference < leastDifference)
{
leastDifference = difference;
closest = record;
}
}
return closest;
}
you can combine your statements into one ala:
var closestRecord = (from rec in _records
select new { FrameDiff = Math.Abs(rec.Frame - frameNumber),
Record = rec }).OrderBy(x => x.FrameDiff).FirstOrDefault().Record;
Maybe you could divide your big itemlist in 5 - 10 smaller lists that are ordered by their Framediff or something ?
this way the search is faster if you know in which list you need to search

Better performance on updating objects with linq

I have two lists of custom objects and want to update a field for all objects in one list if there is an object in the other list which matches on another pair of fields.
This code explains the problem better and produces the results I want. However for larger lists 20k, and a 20k list with matching objects, this takes a considerable time (31s). I can improve this with ~50% by using the generic lists Find(Predicate) method.
using System;
using System.Linq;
using System.Linq.Expressions;
using System.Collections.Generic;
namespace ExperimentFW3
{
public class PropValue
{
public string Name;
public decimal Val;
public decimal Total;
}
public class Adjustment
{
public string PropName;
public decimal AdjVal;
}
class Program
{
static List<PropValue> propList;
static List<Adjustment> adjList;
public static void Main()
{
propList = new List<PropValue>{
new PropValue{Name = "Alfa", Val=2.1M},
new PropValue{Name = "Beta", Val=1.0M},
new PropValue{Name = "Gamma", Val=8.0M}
};
adjList = new List<Adjustment>{
new Adjustment{PropName = "Alfa", AdjVal=-0.1M},
new Adjustment{PropName = "Beta", AdjVal=3M}
};
foreach (var p in propList)
{
Adjustment a = adjList.SingleOrDefault(
av => av.PropName.Equals(p.Name)
);
if (a != null)
p.Total = p.Val + a.AdjVal;
else
p.Total = p.Val;
}
}
}
}
The desired result is: Alfa total=2,Beta total=4,Gamma total=8
But I wonder if this is possible to do even faster. Inner joining the two lists takes very little time, even when looping over 20k items in the resultset.
var joined = from p in propList
join a in adjList on p.Name equals a.PropName
select new { p.Name, p.Val, p.Total, a.AdjVal };
So my question is if it's possible to do something like I would do with T-SQL? An UPDATE from a left join using ISNULL(val,0) on the adjustment value.
That join should be fairly fast, as it will first loop through all of adjList to create a lookup, then for each element in propList it will just use the lookup. This is faster than your O(N * M) method in the larger code - although that could easily be fixed by calling ToLookup (or ToDictionary as you only need one value) on adjList before the loop.
EDIT: Here's the modified code using ToDictionary. Untested, mind you...
var adjDictionary = adjList.ToDictionary(av => av.PropName);
foreach (var p in propList)
{
Adjustment a;
if (adjDictionary.TryGetValue(p.Name, out a))
{
p.Total = p.Val + a.AdjVal;
}
else
{
p.Total = p.Val;
}
}
If adjList might have duplicate names, you should group the items before pushing to dictionary.
Dictionary<string, decimal> adjDictionary = adjList
.GroupBy(a => a.PropName)
.ToDictionary(g => g.Key, g => g.Sum(a => a.AdjVal))
propList.ForEach(p =>
{
decimal a;
adjDictionary.TryGetValue(p.Name, out a);
p.Total = p.Val + a;
});
I know I am late posting this, but I thought someone would appreciate the clearer shorter answer below that handles multiple records per lookup in adjList. Creating a LookUp will allow fast lookups on multiple items and will return an empty list if there are no records in LookUp.
var adjLookUp = adjList.ToLookUp(a => a.PropName);
foreach (var p in propList)
p.Total = p.Val + adjLookUp[p.Name].Sum(a => a.AdjVal);

Categories