Frequency table with zero counts for all values [duplicate]

Frequency table with zero counts for all values [duplicate] - c#

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Dictionary returning a default value if the key does not exist
I have a string that contains only digits. I'm interested in generating a frequency table of the digits. Here's an example string:
var candidate = "424256";
This code works, but it throws a KeyNotFound exception if I look up a digit that's not in the string:
var frequencyTable = candidate
.GroupBy(x => x)
.ToDictionary(g => g.Key, g => g.Count());
Which yields:
Key Count
4 2
2 2
5 1
6 1
So, I used this code, which works:
var frequencyTable = (candidate + "1234567890")
.GroupBy(x => x)
.ToDictionary(g => g.Key, g => g.Count() - 1);
However, in other use cases, I don't want to have to specify all the possible key values.
Is there an elegant way of inserting 0-count records into the frequencyTable dictionary without resorting to creating a custom collection with this behavior, such as this?
public class FrequencyTable<K> : Dictionary<K, int>
{
public FrequencyTable(IDictionary<K, int> dictionary)
: base(dictionary)
{ }
public new int this[K index]
{
get
{
if (ContainsKey(index))
return base[index];
return 0;
}
}
}

If you do not somehow specify all possible key values, your dictionary will not contain an entry for such keys.
Rather than storing zero counts, you may wish to use
Dictionary.TryGetValue(...)
to test the existence of the key before trying to access it. If TryGetValue returns false, simply return 0.
You could easily wrap that in an extension method (rather than creating a custom collection).
static public class Extensions
{
static public int GetFrequencyCount<K>(this Dictionary<K, int> counts, K value)
{
int result;
if (counts.TryGetValue(value, out result))
{
return result;
}
else return 0;
}
}
Usage:
Dictionary<char, int> counts = new Dictionary<char, int>();
counts.Add('1', 42);
int count = counts.GetFrequencyCount<char>('1');

If there is a pattern for all the possible keys, you can use Enumerable.Range (or a for loop) to generate 0-value keys as a base table, then left join in the frequency data to populate the relevant values:
// test value
var candidate = "424256";
// generate base table of all possible keys
var baseTable = Enumerable.Range('0', '9' - '0' + 1).Select(e => (char)e);
// generate freqTable
var freqTable = candidate.ToCharArray().GroupBy (c => c);
// left join frequency table results to base table
var result =
from b in baseTable
join f in freqTable on b equals f.Key into gj
from subFreq in gj.DefaultIfEmpty()
select new { Key = b, Value = (subFreq == null) ? 0 : subFreq.Count() };
// convert final result into dictionary
var dict = result.ToDictionary(r => r.Key, r => r.Value);
Sample result:
Key Value
0 0
1 0
2 2
3 0
4 2
5 1
6 1
7 0
8 0
9 0

Related

Generate list of unique input values within certain range

I have the following class :
public class BlockData
{
int unit1; // valid values are [0-259]
bool unit2;
int unit3; // valid values are [5-245]
}
I want to generate permutation of values over the above fields within the provided range and generate a unique object based on these unique values.
Is there any utility function already in .NET framework to achieve this ?

You could use LINQ:
var all = from u1 in Enumerable.Range(0, 260)
from u2 in Enumerable.Range(0, 2).Select(i => i == 1)
from u3 in Enumerable.Range(5, 241)
select new BlockData { unit1 = u1, unit2 = u2, unit3 = u3 };
Demo: https://dotnetfiddle.net/k0w2WZ
If you just want 100 of them, thanks to LINQ's deferred execution, easy and efficient:
List<BlockData> blockList = all.Take(100).ToList();
or you want 100 where the bool unit2 is true:
List<BlockData> blockList = all.Where(b => b.unit2).Take(100).ToList();

LINQ to JSON group query on array

I have a sample of JSON data that I am converting to a JArray with NewtonSoft.
string jsonString = #"[{'features': ['sunroof','mag wheels']},{'features': ['sunroof']},{'features': ['mag wheels']},{'features': ['sunroof','mag wheels','spoiler']},{'features': ['sunroof','spoiler']},{'features': ['sunroof','mag wheels']},{'features': ['spoiler']}]";
I am trying to retrieve the features that are most commonly requested together. Based on the above dataset, my expected output would be:
sunroof, mag wheels, 2
sunroof, 1
mag wheels 1
sunroof, mag wheels, spoiler, 1
sunroof, spoiler, 1
spoiler, 1
However, my LINQ is rusty, and the code I am using to query my JSON data is returning the count of the individual features, not the features selected together:
JArray autoFeatures = JArray.Parse(jsonString);
var features = from f in autoFeatures.Select(feat => feat["features"]).Values<string>()
group f by f into grp
orderby grp.Count() descending
select new { indFeature = grp.Key, count = grp.Count() };
foreach (var feature in features)
{
Console.WriteLine("{0}, {1}", feature.indFeature, feature.count);
}
Actual Output:
sunroof, 5
mag wheels, 4
spoiler, 3
I was thinking maybe my query needs a 'distinct' in it, but I'm just not sure.

This is a problem with the Select. You are telling it to make each value found in the arrays to be its own item. In actuality you need to combine all the values into a string for each feature. Here is how you do it
var features = from f in autoFeatures.Select(feat => string.Join(",",feat["features"].Values<string>()))
group f by f into grp
orderby grp.Count() descending
select new { indFeature = grp.Key, count = grp.Count() };
Produces the following output
sunroof,mag wheels, 2
sunroof, 1
mag wheels, 1
sunroof,mag wheels,spoiler, 1
sunroof,spoiler, 1
spoiler, 1

You could use a HashSet to identify the distinct sets of features, and group on those sets. That way, your Linq looks basically identical to what you have now, but you need an additional IEqualityComparer class in the GroupBy to help compare one set of features to another to check if they're the same.
For example:
var featureSets = autoFeatures
.Select(feature => new HashSet<string>(feature["features"].Values<string>()))
.GroupBy(a => a, new HashSetComparer<string>())
.Select(a => new { Set = a.Key, Count = a.Count() })
.OrderByDescending(a => a.Count);
foreach (var result in featureSets)
{
Console.WriteLine($"{String.Join(",", result.Set)}: {result.Count}");
}
And the comparer class leverages the SetEquals method of the HashSet class to check if one set is the same as another (and this handles the strings being in a different order within the set, etc.)
public class HashSetComparer<T> : IEqualityComparer<HashSet<T>>
{
public bool Equals(HashSet<T> x, HashSet<T> y)
{
// so if x and y both contain "sunroof" only, this is true
// even if x and y are a different instance
return x.SetEquals(y);
}
public int GetHashCode(HashSet<T> obj)
{
// force comparison every time by always returning the same,
// or we could do something smarter like hash the contents
return 0;
}
}

How to create group By in LINQ in a dictionary

I am learning LINQ Query and stuck at one place. Suppose I have a strongly typed datatable like below
idGroup idUnit Status
1 12 foo
1 13 bar
1 15 hello
2 12 nofoo
2 16 nohello
I want the result like below:
int Generic List of int
1 12,13,15
2 12,16
So more like I want to create a dictionary but group by it based on idGroup.
My Attempt:
Dictionary<int, List<int>> temp = Mydatatable.ToDictionary(p => p.idGroup, p => p.idUnit);
Error: Above LINQ will return me <int>, <int>, but my expected result is <int>, List<int>.`
I want something like below:
Dictionary<int, List<int>> temp = Mydatatable.ToDictionary(p => p.idGroup,
p => p.idUnit.ToList());

First of all the error is because in your ToDictionary the value you specify is the idUnit which is an int and not a List<int> (for instance writing p => new List<int> { p.idUnit } would resolve that error)
Then after that for a dictionary output first GroupBy and then ToDictionary. Otherwise you will get an exception stating the given key already exists in the dictionary.
var result = Mydatatable.GroupBy(key => key.idGroup, val => val.idUnit)
.ToDictionary(key => key.Key, val => val.ToList());
Another option is to use a Lookup instead of a Dictionary and then just
var result = Mydatatable.ToLookup(p => p.idGroup, p => p.idUnit);

LINQ - GroupBy multiple columns and merge the result

I am working with sizeable set of data (~130.000 records), I've managed to transform it the way I want it (to csv).
Here is a simplified example of how the List looks like:
"Surname1, Name1;Address1;State1;YES;Group1"
"Surname2, Name2;Address2;State2;YES;Group2"
"Surname2, Name2;Address2;State2;YES;Group1"
"Surname3, Name3;Address3;State3;NO;Group1"
"Surname1, Name1;Address2;State1;YES;Group1"
Now, I would like to merge the records if 1st, 2nd AND 3rd column match, like so:
output
"Surname1, Name1;Address1;State1;YES;Group1"
"Surname2, Name2;Address2;State2;YES;Group2 Group1"
"Surname3, Name3;Address3;State3;NO;Group1"
"Surname1, Name1;Address2;State1;YES;Group1"
Here's what I've got so far:
output.GroupBy(x => new { c1 = x.Split(';')[0], c2 = x.Split(';')[1], c3 = x.Split(';')[2] }).Select(//have no idea what should go here);

First try to get the columns you need projecting the result in an anonymous type:
var query= from r in output
let columns= r.Split(';')
select new { c1 =columns[0], c2 =columns[1], c3 = columns[2] ,c5=columns[4]};
And then create the groups but now using the anonymous object you define in the previous query:
var result= query.GroupBy(e=>new {e.c1, e.c2, e.c3})
.Select(g=> new {SurName=g.Key.c1,
Name=g.Key.c2,
Address=g.Key.c3,
Groups=String.Join(",",g.Select(e=>e.c4)});
I know I'm missing some columns but I think you can get the idea.
PS: The fact I have separated the logic in two queries is just for readability propose, you can compose both queries in one but that is not going to change the performance because LINQ use deferred evaluation.

This is how I would do it:
class Program
{
static void Main(string[] args)
{
List<string> input = new List<string> {
"Surname1, Name1;Address1;State1;YES;Group1",
"Surname2, Name2;Address2;State2;YES;Group2",
"Surname2, Name2;Address2;State2;YES;Group1",
"Surname3, Name3;Address3;State3;NO;Group1",
"Surname1, Name1;Address2;State1;YES;Group1",
};
var transformed = input.Select(s => s.Split(';'))
.GroupBy( s => new string[] { s[0], s[1], s[2], s[3] },
(key, elements) => string.Join(";", key) + ";" + string.Join(" ", elements.Select(e => e.Last())),
new MyEqualityComparer())
.ToList();
}
}
internal class MyEqualityComparer : IEqualityComparer<string[]>
{
public bool Equals(string[] x, string[] y)
{
return x[0] == y[0] && x[1] == y[1] && x[2] == y[2];
}
public int GetHashCode(string[] obj)
{
int hashCode = obj[0].GetHashCode();
hashCode = hashCode ^ obj[1].GetHashCode();
hashCode = hashCode ^ obj[2].GetHashCode();
return hashCode;
}
}
Consider the first 4 columns as the grouping key, but only use the first 3 for the comparison (hence the custom IEqualityComparer).
Then if you have the (key, elements) groups, transform them so that you join the elements of the key with ; (remember, the key consists of the first 4 columns) and add to it the last element from every member of the group, joined with a space.

How to calculate a running total using linq

I have a linq query result as shown in the image. In the final query (not shown) I am grouping by Year by LeaveType. However I want to calculate a running total for the leaveCarriedOver per type over years. That is, sick LeaveCarriedOver in 2010 becomes "opening" balance for sick leave in 2011 plus the one for 2011.
I have done another query on the shown result list that looks like:
var leaveDetails1 = (from l in leaveDetails
select new
{
l.Year,
l.LeaveType,
l.LeaveTaken,
l.LeaveAllocation,
l.LeaveCarriedOver,
RunningTotal = leaveDetails.Where(x => x.LeaveType == l.LeaveType).Sum(x => x.LeaveCarriedOver)
});
where leaveDetails is the result from the image.
The resulting RunningTotal is not cumulative as expected. How can I achieve my initial goal. Open to any ideas - my last option will be to do it in javascript in the front-end. Thanks in advance

The simple implementation is to get the list of possible totals first then get the sum from the details for each of these categories.
getting the distinct list of Year and LeaveType is a group by and select first of each group. we return a List<Tuple<int, string>> where Int is the year and string is the LeaveType
var distinctList = leaveDetails1.GroupBy(data => new Tuple<int, string>(data.Year, data.LeaveType)).Select(data => data.FirstOrDefault()).ToList();
then we want total for each of these elements so you want a select of that list to return the id (Year and LeaveType) plus the total so an extra value to the Tuple<int, string, int>.
var totals = distinctList.Select(data => new Tuple<int, string, int>(data.Year, data.LeaveType, leaveDetails1.Where(detail => detail.Year == data.Year && detail.LeaveType == data.LeaveType).Sum(detail => detail.LeaveCarriedOver))).ToList();
reading the line above you can see it take the distinct totals we want to list, create a new object, store the Year and LeaveType for reference then set the last Int with the Sum<> of the filtered details for that Year and LeaveType.

If I completely understand what you are trying to do then I don't think I would rely on the built in LINQ operators exclusively. I think (emphasis on think) that any combination of the built in LINQ operators is going to solve this problem in O(n^2) run-time.
If I were going to implement this in LINQ then I would create an extension method for IEnumerable that is similar to the Scan function in reactive extensions (or find a library out there that has already implemented it):
public static class EnumerableExtensions
{
public static IEnumerable<TAccumulate> Scan<TSource, TAccumulate>(
this IEnumerable<TSource> source,
TAccumulate seed,
Func<TAccumulate, TSource, TAccumulate> accumulator)
{
// Validation omitted for clarity.
foreach(TSource value in source)
{
seed = accumulator.Invoke(seed, value);
yield return seed;
}
}
}
Then this should do it around O(n log n) (because of the order by operations):
leaveDetails
.OrderBy(x => x.LeaveType)
.ThenBy(x => x.Year)
.Scan(new {
Year = 0,
LeaveType = "Seed",
LeaveTaken = 0,
LeaveAllocation = 0.0,
LeaveCarriedOver = 0.0,
RunningTotal = 0.0
},
(acc, x) => new {
x.Year,
x.LeaveType,
x.LeaveTaken,
x.LeaveAllocation,
x.LeaveCarriedOver,
RunningTotal = x.LeaveCarriedOver + (acc.LeaveType != x.LeaveType ? 0 : acc.RunningTotal)
});
You don't say, but I assume the data is coming from a database; if that is the case then you could get leaveDetails back already sorted and skip the sorting here. That would get you down to O(n).
If you don't want to create an extension method (or go find one) then this will achieve the same thing (just in an uglier way).
var temp = new
{
Year = 0,
LeaveType = "Who Cares",
LeaveTaken = 3,
LeaveAllocation = 0.0,
LeaveCarriedOver = 0.0,
RunningTotal = 0.0
};
var runningTotals = (new[] { temp }).ToList();
runningTotals.RemoveAt(0);
foreach(var l in leaveDetails.OrderBy(x => x.LeaveType).ThenBy(x => x.Year))
{
var s = runningTotals.LastOrDefault();
runningTotals.Add(new
{
l.Year,
l.LeaveType,
l.LeaveTaken,
l.LeaveAllocation,
l.LeaveCarriedOver,
RunningTotal = l.LeaveCarriedOver + (s == null || s.LeaveType != l.LeaveType ? 0 : s.RunningTotal)
});
}
This should also be O(n log n) or O(n) if you can pre-sort leaveDetails.

If I understand the question you want something like
decimal RunningTotal = 0;
var results = leaveDetails
.GroupBy(r=>r.LeaveType)
.Select(r=> new
{
Dummy = RunningTotal = 0 ,
results = r.OrderBy(o=>o.Year)
.Select(l => new
{
l.Year,
l.LeaveType ,
l.LeaveAllocation,
l.LeaveCarriedOver,
RunningTotal = (RunningTotal = RunningTotal + l.LeaveCarriedOver )
})
})
.SelectMany(a=>a.results).ToList();
This is basically using the Select<TSource, TResult> overload to calculate the running balance, but first grouped by LeaveType so we can reset the RunningTotal for every LeaveType, and then ungrouped at the end.

You have to use Window Function Sum here. Which is not supported by EF Core and earlier versions of EF. So, just write SQL and run it via Dapper
SELECT
l.Year,
l.LeaveType,
l.LeaveTaken,
l.LeaveAllocation,
l.LeaveCarriedOver,
SUM(l.LeaveCarriedOver) OVER (PARTITION BY l.Year, l.LeaveType) AS RunningTotal
FROM leaveDetails l
Or, if you are using EF Core, use package linq2db.EntityFrameworkCore
var leaveDetails1 = from l in leaveDetails
select new
{
l.Year,
l.LeaveType,
l.LeaveTaken,
l.LeaveAllocation,
l.LeaveCarriedOver,
RunningTotal = Sql.Ext.Sum(l.LeaveCarriedOver).Over().PartitionBy(l.Year, l.LeaveType).ToValue()
};
// switch to alternative LINQ translator
leaveDetails1 = leaveDetails1.ToLinqToDB();

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Frequency table with zero counts for all values [duplicate] - c#

Related

Generate list of unique input values within certain range

LINQ to JSON group query on array

How to create group By in LINQ in a dictionary

LINQ - GroupBy multiple columns and merge the result

How to calculate a running total using linq

Categories

Resources