Dictionary<> value count c# - c#

I have dictionary object like this:
var dictionary = new Dictionary<string, List<int>()>;
The number of keys is not very large but the list of integers in the value can be quite large (in the order of 1000's)
Given a list of keys (keylist), I need to count the number of times each integer appears for each key and return them ordered by frequency.
Output:
{int1, count1}
{int2, count2}
...
This is the solution I have come up with:
var query = _keylist.SelectMany(
n=>_dictionary[n]).Group(g=>g).Select(
g=> new[] {g.key, g.count}).OrderByDescending(g=>g[1]);
Even when this query produces the desired result, it's not very efficient.
Is there a clever way to produce the same result with less processing?

I would do it this way:
var query =
from k in _keylist
from v in dictionary[k]
group v by v into gvs
let result = new
{
key = gvs.Key,
count = gvs.Count(),
}
orderby result.count descending
select result;
To me this is quite straight forward and simple and well worth accepting any (minor) performance hit by using LINQ.
And alternative approach that doesn't create the large list of groups would be to do this:
var query =
_keylist
.SelectMany(k => dictionary[k])
.Aggregate(
new Dictionary<int, int>(),
(d, v) =>
{
if (d.ContainsKey(v))
{
d[v] += 1;
}
else
{
d[v] = 1;
}
return d;
})
.OrderByDescending(kvp => kvp.Value)
.Select(kvp => new
{
key = kvp.Key,
count = kvp.Value,
});

From an algorithmic space- and time-usage point of view, the only thing I see that is suboptimal is the use of GroupBy when you don't actually need the groups (only the group counts). You can use the following extension method instead.
public static Dictionary<K, int> CountBy<T, K>(
this IEnumerable<T> source,
Func<T, K> keySelector)
{
return source.SumBy(keySelector, item => 1);
}
public static Dictionary<K, int> SumBy<T, K>(
this IEnumerable<T> source,
Func<T, K> keySelector,
Func<T, int> valueSelector)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
if (keySelector == null)
{
throw new ArgumentNullException("keySelector");
}
var dictionary = new Dictionary<K, int>();
foreach (var item in source)
{
var key = keySelector(item);
int count;
if (!dictionary.TryGetValue(key, out count))
{
count = 0;
}
dictionary[key] = count + valueSelector(item);
}
return dictionary;
}
Note the advantage is that the lists of numbers are enumerated but not stored. Only the counts are stored. Note also that the keySelector parameter is not even necessary in your case and I only included it to make the extension method slightly more general.
The usage is then as follows.
var query = _keylist
.Select(k => _dictionary[k])
.CountBy(n => n)
.OrderByDescending(p => p.Value);
This will you get you a sequence of KeyValuePair<int, int> where the Key is the number from your original lists and the Value is the count.
To more efficiently handle a sequence of queries, you can preprocess your data.
Dictionary<string, Dictionary<int, int>> preprocessedDictionary
= _dictionary.ToDictionary(p => p.Key, p => p.Value.CountBy(n => n));
Now you can perform a query more efficiently.
var query = _keylist
.SelectMany(k => preprocessedDictionary[k])
.SumBy(p => p.Key, p => p.Value)
.OrderByDescending(p => p.Value);

Related

Linq aggregate dictionary into new dictionary

class Key { string s; int i; }
Given a Dictionary<Key,int> I want a new Dictionary<string,int> that is a mapping of the minimum dictionary value for each Key.s over all keys.
I feel like this should be easy but I just can't get it.
Thanks
clarification:
var dict = new Dictionary<Key,int>();
dict.Add(new Key("a", 123), 19);
dict.Add(new Key("a", 456), 12);
dict.Add(new Key("a", 789), 13);
dict.Add(new Key("b", 998), 99);
dict.Add(new Key("b", 999), 11);
and I want to produce the dictionary:
"a" -> 12
"b" -> 11
hope that helps.
I'm not clear on exactly what you're trying to do, but you can do a mapping from one dictionary to another with .Select(... and/or .ToDictionary(...
For example:
Dictionary<Key, int> original = ...
Dictionary<string, int> mapped = original.ToDictionary((kvp) => kvp.Key.s, (kvp) => kvp.Key.i);
If you improve your question to be more clear, I'll improve my answer.
EDIT: (question was clarified)
var d = dict.GroupBy(kvp => kvp.Key.s).ToDictionary(g => g.Key, g => g.Min(k => k.Value));
You want to group by the key s property, then select the minimum of the dictionary value as the new dictionary value.
A more generic method to skip the Lookup that is created by .GroupBy :
public static Dictionary<K, V> aggregateBy<T, K, V>(
this IEnumerable<T> source,
Func<T, K> keySelector,
Func<T, V> valueSelector,
Func<V, V, V> aggregate,
int capacity = 0,
IEqualityComparer<K> comparer = null)
{
var dict = new Dictionary<K, V>(capacity, comparer);
foreach (var t in source)
{
K key = keySelector(t);
V accumulator, value = valueSelector(t);
if (dict.TryGetValue(key, out accumulator))
value = aggregate(accumulator, value);
dict[key] = value;
}
return dict;
}
Sample use:
var dict = new Dictionary<Tuple<string,int>, int>();
dict.Add(Tuple.Create("a", 123), 19);
dict.Add(Tuple.Create("a", 456), 12);
dict.Add(Tuple.Create("a", 789), 13);
dict.Add(Tuple.Create("b", 998), 99);
dict.Add(Tuple.Create("b", 999), 11);
var d = dict.aggregateBy(p => p.Key.Item1, p => p.Value, Math.Min);
Debug.Print(string.Join(", ", d)); // "[a, 12], [b, 11]"

Remove all switched dictionary pairs

I have a Dictionary and want to LINQ-remove all pairs (B, A) if there is a pair (A, B).
Dictionary<int, int> dictionary = new Dictionary<int, int>();
dictionary.Add(1, 2);
dictionary.Add(3, 4); // keep it
dictionary.Add(4, 3); // remove it
//dictionary.Add(4, 3); // remove it (ignore this impossible line, #Rahul Singh is right)
You need to implement a custom equality comparer and use the Distinct method.
Dictionary<int, int> dictionary = new Dictionary<int, int>();
dictionary.Add(1, 2);
dictionary.Add(3, 4);
dictionary.Add(4, 3);
var result = dictionary.Distinct(new KeyValuePairEqualityComparer()).ToDictionary(x => x.Key, x => x.Value);
}
The equality comparer is defined as
private class KeyValuePairEqualityComparer : IEqualityComparer<KeyValuePair<int, int>>
{
public bool Equals(KeyValuePair<int, int> x, KeyValuePair<int, int> y)
{
return x.Key == y.Value && x.Value == y.Key;
}
public int GetHashCode(KeyValuePair<int, int> obj)
{
// Equality check happens on HashCodes first.
// Multiplying key/value pairs, ensures that mirrors
// are forced to check for equality via the Equals method
return obj.Key * obj.Value;
}
}
The naive approach would be to simply filter them as you need.
dictionary = dictionary
.Where( kvp => !(dictionary.ContainsKey(kvp.Value) && dictionary[kvp.Value]==kvp.Key) )
.ToDictionary( kvp => kvp.Key, kvp => kvp.Value )`
Let your pair is (1,2), for removing this pair from the dictionary you need not to bother about the value, Since Keys are unique. So you can delete using the following code:dictionary.Remove(pair.Key);
But there is a chance for KeyNotFoundException if the specified key is not found in the collection. so its always better to check for that before proceeding with remove:
int value;
if (dictionary.TryGetValue(pair.Key, out value))
{
dictionary.Remove(pair.Key);
}

How so I take the Top N (percentage) values in a dictionary?

I have a dictionary with a string key and integer value. The value represents the number of occurrences of the key.
How do I create a new dictionary with the keys and values representing the top 25% of values? The sum of the values should be equal to or greater than the sum of all values. For example, if my dictionary contains 5 items with values (5, 3, 2, 1, 1) and I want the top 50%, the new dictionary would contain values (5, 3) because their sum is 8 and that is >= 50% of 12. This dictionary needs to be sorted descending by value and then the top N taken such that their sum meets the specified percentage.
This code gives me the top N but is based on a known count. How do I take into account the desired percentage?
var topItemsCount = dictionary.OrderByDescending(entry => entry.Value)
.Take(topN)
.ToDictionary(pair => pair.Key, pair => pair.Value);
Something like:
var topItemsCount = dictionary.OrderByDescending(entry => entry.Value)
.Take(Math.Floor(dictionary.Count * 0.25))
.ToDictionary(pair => pair.Key, pair => pair.Value);
Running .Count on a dictionary returns the number of key-value pairs in the collection. Taking Math.Floor rounds it down to the nearest int.
Edited to reflect comments
I would probably just use a simple non-linq solution to achieve what you want. Maybe more verbose, but it's pretty clear to anyone what it does:
var total = dictionary.Sum(e => e.Value);
var cutoff = total * 0.5;
var sum = 0;
var pairs = new List<KeyValuePair<string, int>>();
foreach (var pair in dictionary.OrderByDescending(e => e.Value))
{
sum += pair.Value;
pairs.Add(pair);
if (sum > cutoff)
break;
}
dictionary = pairs.ToDictionary(pair => pair.Key, pair => pair.Value);
One more edit
If you really want more linq, you could try holding an accumulated class level variable.
private static int sum = 0;
static void Main(string[] args)
{
var dictionary = new Dictionary<string, int>()
{
{"1",5},
{"2",3},
{"3",2},
{"4",1},
{"5",1},
};
var total = dictionary.Sum(e => e.Value);
var cutoff = total * 0.5;
var filtered = dictionary.OrderByDescending(e => e.Value)
.TakeWhile(e => Add(e.Value).Item1 < cutoff)
.ToDictionary(pair => pair.Key, pair => pair.Value);
}
private static Tuple<int, int> Add(int x)
{
return Tuple.Create(sum, sum += x);
}
It's a bit convoluted with the add function returning a tuple because you are including the first value that breaches the cut off in the result (i.e. even if 5 + 3 = 8 is greater than the cut off 6, you still include 3).
Rephrasing the question, into two parts:
Given a list of strings and values, find a value representing the Nth percentage
Given a list of string and values, and a value representing the Nth percentage, return a new list of string and values having values greater than or equal to the given number.
Question 1 would look like
double percent = inputValue;
double n = dictionary.Values.Sum() * percent;
Question 2 would look like:
Dictionary<string, int> newValues = dictionary.OrderByDescending(_ => _.Value)
.Aggregate(
new {sum = 0.0, values = new Dictionary<string, int>()},
(sumValues, kv) =>
{
if (sumValues.sum <= n)
sumValues.values.Add(kv.Key, kv.Value);
return new {sum = sumValues.sum + kv.Value, values = sumValues.values};
},
sumValues => sumValues.values);
You could also use a for loop and a running sum, but for running totals with limited scope, I like the compactness of the Aggregate function. The downside to this is that the entire source Dictionary is still iterated. A custom iterator method would get around this. For example:
public static class Extensions
{
public static IEnumerable<TThis> TakeGreaterThan<TThis>(this IEnumerable<TThis> source, Func<TThis, double> valueFunc, double compareTo)
{
double sum = 0.0;
IEnumerable<TThis> orderedSource = source.OrderByDescending(valueFunc);
var enumerator = orderedSource.GetEnumerator();
while (sum <= compareTo && enumerator.MoveNext())
{
yield return enumerator.Current;
sum += valueFunc(enumerator.Current);
}
}
}
Used as
Dictionary<string, int> newValues = dictionary.TakeGreaterThan(_ => _.Value, n).ToDictionary(_ => _.Key, _ => _.Value);
May be this?
var dictionary = new Dictionary<string, int>()
{
{"1",5},
{"2",3},
{"3",2},
{"4",1},
{"5",1},
};
var max = dictionary.Values.Max();
int percent = 50;
int percentageValue = max*percent /100;
var topItems = dictionary.OrderByDescending(entry => entry.Value)
.TakeWhile(x => x.Value > percentageValue)
.ToDictionary(pair => pair.Key, pair => pair.Value);
foreach (var item in topItems)
{
Console.WriteLine(item.Value);
}
Outputs:
5
3

How can i convert Linq var to List

I am trying to convert the Linq var to List.my c# code is
private List<HyperlinkInfo> GetHyperlinkByCode()
{
TourInfoBusiness obj = new TourInfoBusiness();
List<HyperlinkInfo> lst = new List<HyperlinkInfo>();
lst = obj.GetAllHyperlink();
//lst = lst.Select(x => x.Attraction).ToList();
var k = lst.Select(x => x.Attraction).Distinct();
}
if you look at the above code till the Line var k = lst.Select(x => x.Attraction).Distinct(); is ok Now can i convert var k to List.
According to your comments you need single HyperlinInfo object for each Attraction value (which is string). So, use grouping and ToList():
private List<HyperlinkInfo> GetHyperlinkByCode()
{
TourInfoBusiness obj = new TourInfoBusiness();
List<HyperlinkInfo> lst = obj.GetAllHyperlink();
return lst.GroupBy(x => x.Attraction) // group links by attraction
.Select(g => g.First()) // select first link from each group
.ToList(); // convert result to list
}
Also you can use morelinq DistinctBy extension (available from NuGet):
private List<HyperlinkInfo> GetHyperlinkByCode()
{
TourInfoBusiness obj = new TourInfoBusiness();
List<HyperlinkInfo> lst = obj.GetAllHyperlink();
return lst.DistinctBy(x => x.Attraction).ToList();
}
Use Enumerable.ToList<TSource> Method. Just Add ToList() at the end of your query or
return k.ToList();
So your method can be:
private List<HyperlinkInfo> GetHyperlinkByCode()
{
TourInfoBusiness obj = new TourInfoBusiness();
List<HyperlinkInfo> lst = new List<HyperlinkInfo>();
lst = obj.GetAllHyperlink();
//lst = lst.Select(x => x.Attraction).ToList();
var k = lst.Select(x => x.Attraction).Distinct();
return k.ToList();
}
But x.Attraction should be HyperLinkInfo type object.
EDIT: Based on comment it appears that x.Attraction is a string, you need to create object of your class Project.Bll.HyperlinkInfo in select statement and then return that list. Something like:
var k = lst.Select(new Project.Bll.HyperLinkInfo(x => x.Attraction)).Distinct();
Assuming that Project.Bll.HyperlinkInfo constructor takes a string parameter to return a HyperLinkInfo object.
Use this:
var k = lst.Select(x => x.Attraction).Distinct().ToList();
Now k is List of x.Attraction type. If your x.Attraction is string, use this:
List<string> k = lst.Select(x => x.Attraction).Distinct().ToList();
Use ToList() to your query;
Creates a List<T> from an IEnumerable<T>.
List<HyperlinkInfo> k = lst.Select(x => x.Attraction).Distinct().ToList();
try this add DistinctBy of moreLinq:
public static IEnumerable<TSource> DistinctBy<TSource, TKey>
(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
HashSet<TKey> seenKeys = new HashSet<TKey>();
foreach (TSource element in source)
{
if (seenKeys.Add(keySelector(element)))
{
yield return element;
}
}
}
and call it in your code:
lst.DistinctBy(x => x.Attraction).toList();
Try this code:
return (List<Hyperlink Info>) k

Merging sequences by type With LINQ

I want to use LINQ to convert this
IEnumerable<int>[] value1ByType = new IEnumerable<int>[3];
value1ByType[0]= new [] { 0};
value1ByType[1]= new [] {10,11};
value1ByType[2]= new [] {20};
var value2ToType = new Dictionary<int,int> {
{100,0},
{101,1},
{102,2},
{103,1}};
to this
var value2ToValue1 = new Dictionary<int,int> {
{100, 0},
{101,10},
{102,20},
{103,11}};
Is there a way to do this with LINQ? Without LINQ I would use multiple IEnumerators, one for each IEnumerable of value1ByType. like this:
// create enumerators
var value1TypeEnumerators = new List<IEnumerator<int>>();
for (int i = 0; i < value1ByType.Length; i++)
{
value1TypeEnumerators.Add(value1ByType[i].GetEnumerator());
value1TypeEnumerators[i].MoveNext();
}
// create wanted dictionary
var value2ToValue1 = new Dictionary<int, int>();
foreach (var item in Value2ToType)
{
int value1=value1TypeEnumerators[item.Value].Current;
value2ToValue1.Add(item.Key, value1);
value1TypeEnumerators[item.Value].MoveNext();
}
Any Idea how to do this in LINQ?
Not pure but you can at least do ...
var enumerators = value1ByType.Select(v => v.GetEnumerator()).ToArray();
var value2ToValue1 = value2ToType
.ToDictionary(x => x.Key, x => { enumerators[x.Value].MoveNext(); return enumerators[x.Value].Current; });
But there are so many ways this could go wrong it begs the question - why was the data in those data-structures anyway? and can you fix that instead? How did you end up with exactly the right number of references in the 2nd data structure to elements in the first?
I'm pretty sure that #Hightechrider's solution is most performant than this one, but if you really like the syntax sugar way, you can do it like this:
public IDictionary<int, int> MergeSequences(IEnumerable<int>[] value1ByType, Dictionary<int, int> value2ToType)
{
int pos = 0;
var value1ByTypePos = from byType in value1ByType
select new { Pos = pos++, Enumerator = byType.GetEnumerator() };
return (from byType in value1ByTypePos
join toType in value2ToType
on byType.Pos equals toType.Value
select new { toType.Key, Value = byType.Enumerator.GetNext() })
.ToDictionary(pair => pair.Key, pair => pair.Value);
}
I've added an extension method to the IEnumerator interface like this:
public static T GetNext<T>(this IEnumerator<T> enumerator)
{
if (!enumerator.MoveNext())
throw new InvalidOperationException();
return enumerator.Current;
}
Now you have to be aware that any of this solutions can give you slightly different results, depending on how elements in the dictionary are enumerated. For example, another valid result to this code is:
var value2ToValue1 = new Dictionary<int,int> {
{100, 0},
{103, 10},
{102, 20},
{101, 11}};
Notice that now 101 is paired with 11 and 103 is paired with 10. If this is a problem, then you should use a SortedDictionary<int, int> when defining value2ToType variable.
What you can do for sure is replace the first part with the following:
var value1TypeEnumerators = value1ByType.ToList();
instead of using an enumerator.
If I do not care about performance I could also write:
var value2Ordered = Value2ToType.OrderBy(x => x.Value).Select(x=>x.Key);
var value1Ordered = from item in value1ByType from subitem in item select subitem;
var value2ToValue1 = value2Ordered.Zip(value1Ordered, (x, y) => new { Key = x, Value = y })
.ToDictionary(item => item.Key, item => item.Value);
I used the zip method from a stackoverflow community wiki. I didn't test this with the c#4.0 zip method

Categories