LINQ to JSON group query on array - c#

I have a sample of JSON data that I am converting to a JArray with NewtonSoft.
string jsonString = #"[{'features': ['sunroof','mag wheels']},{'features': ['sunroof']},{'features': ['mag wheels']},{'features': ['sunroof','mag wheels','spoiler']},{'features': ['sunroof','spoiler']},{'features': ['sunroof','mag wheels']},{'features': ['spoiler']}]";
I am trying to retrieve the features that are most commonly requested together. Based on the above dataset, my expected output would be:
sunroof, mag wheels, 2
sunroof, 1
mag wheels 1
sunroof, mag wheels, spoiler, 1
sunroof, spoiler, 1
spoiler, 1
However, my LINQ is rusty, and the code I am using to query my JSON data is returning the count of the individual features, not the features selected together:
JArray autoFeatures = JArray.Parse(jsonString);
var features = from f in autoFeatures.Select(feat => feat["features"]).Values<string>()
group f by f into grp
orderby grp.Count() descending
select new { indFeature = grp.Key, count = grp.Count() };
foreach (var feature in features)
{
Console.WriteLine("{0}, {1}", feature.indFeature, feature.count);
}
Actual Output:
sunroof, 5
mag wheels, 4
spoiler, 3
I was thinking maybe my query needs a 'distinct' in it, but I'm just not sure.

This is a problem with the Select. You are telling it to make each value found in the arrays to be its own item. In actuality you need to combine all the values into a string for each feature. Here is how you do it
var features = from f in autoFeatures.Select(feat => string.Join(",",feat["features"].Values<string>()))
group f by f into grp
orderby grp.Count() descending
select new { indFeature = grp.Key, count = grp.Count() };
Produces the following output
sunroof,mag wheels, 2
sunroof, 1
mag wheels, 1
sunroof,mag wheels,spoiler, 1
sunroof,spoiler, 1
spoiler, 1

You could use a HashSet to identify the distinct sets of features, and group on those sets. That way, your Linq looks basically identical to what you have now, but you need an additional IEqualityComparer class in the GroupBy to help compare one set of features to another to check if they're the same.
For example:
var featureSets = autoFeatures
.Select(feature => new HashSet<string>(feature["features"].Values<string>()))
.GroupBy(a => a, new HashSetComparer<string>())
.Select(a => new { Set = a.Key, Count = a.Count() })
.OrderByDescending(a => a.Count);
foreach (var result in featureSets)
{
Console.WriteLine($"{String.Join(",", result.Set)}: {result.Count}");
}
And the comparer class leverages the SetEquals method of the HashSet class to check if one set is the same as another (and this handles the strings being in a different order within the set, etc.)
public class HashSetComparer<T> : IEqualityComparer<HashSet<T>>
{
public bool Equals(HashSet<T> x, HashSet<T> y)
{
// so if x and y both contain "sunroof" only, this is true
// even if x and y are a different instance
return x.SetEquals(y);
}
public int GetHashCode(HashSet<T> obj)
{
// force comparison every time by always returning the same,
// or we could do something smarter like hash the contents
return 0;
}
}

Related

Linq Dynamic Expressions Groupby with sum

I cannot figure out how to do a Linq.Dynamic on an ObservableCollection and sum some fields. Basically I want to do this;
var group
from x in MyCollection
group x by x.MyField into g
select new MyClass
{
MyField = g.Key,
Total = g.Sum(y => y.Total)
};
Figured it would be this in Linq.Dynamic;
var dGroup = MyCollection
.GroupBy("MyField ", "it")
.Select("new (it.Key as MyField , it as new MyClass { Total = sum(it.Total ) })");
However this keeps give me errors.
FYI MyCollection is a ObservableCollection<MyClass>
Edit:
I am sorry did not make this very clear. The reason I need it to be Linq.Dynamic is that the actual MyClass has about 10 properties that user can pick to group the collection MyCollection in. To make matters worse is the user can select multiple grouping. So hand coding the groups isn't an option. So while #Harald Coppoolse does work it requires that myClass.MyField to be hand coded.
So MyCollection is a sequence of MyClass objects, where every MyClass object has at least two properties: MyField and Total.
You want the sum of all Total values that have the same value for MyField
For example:
MyField Total
X 10
Y 5
X 7
Y 3
You want a sequence with two elements: one for the X with a grand total of 10 + 7 = 17; and one for the Y with a grand total of 5 + 3 = 8
In method syntax:
var result = MyCollection.Cast<MyClass>() // take all elements of MyCollection
.GroupBy(myClass => myClass.MyField) // Group into groups with same MyField
.Select(group => new MyClass() // for every group make one new MyClass object
{
MyField = group.Key,
Total = group // to calculate the Total:
.Select(groupElement => groupElement.Total) // get Total for all elements in the group
.Sum(), // and sum it
})
If your ObservableCollection is in fact an ObservableCollection<MyClass> than you won't need the Cast<MyClass> part.
There is a lesser known GroupBy overload that will do this in one statement. I'm not sure if this one will improve readability:
var result = MyCollection.Cast<MyClass>() // take all elements of MyCollection
.GroupBy(myClass => myClass.MyField, // group into groups with same MyField
myClass => myClass.Total, // for every element in the group take the Total
(myField, totals) => new MyClass() // from the common myField and all totals in the group
{ // make one new MyClass object
MyField = myField, // the common myField of all elements in the group
Total = totals.Sum(), // sum all found totals in the group
});
So this might not be the best way, but it is the only I found. FYI it more manual work than standard LINQ.
Here is the new Dynamic Linq;
var dGroup = MyCollection
.GroupBy("MyField ", "it")
.Select("new (it.Key as MainID, it as MyClass)");
The next problem is not only do you need to iterate through each MainID but you need to iterate through MyClass and sum for each MainID;
foreach (dynamic r in dGroup)
{
foreach (dynamic g in r.MyClass)
{
gTotal = gTotal + g.Total;
}
}
If someone can show me a better way to do this I will award the correct answer to that.

select multiple objects from list based on values from another list

I have two lists. One list is of type Cascade (named, cascadeList) & the other list if of type PriceDetails (named priceList), both classes are shown below. I have also given a simple example of what I'm trying to achieve below the classes.
So the priceList contains a list of PriceDetail objects where they can be multiple (up to three) PriceDetail objects with the same ISIN. When there are multiple PriceDetails with the same ISIN I want to select just one based on the Source field.
This is where the cascadeList comes in. So if there were 3 PriceDetails with the same ISIN I would want to select the one where the source has the highest rank in the cascade list (1 is the highest). Hopefully the example below helps.
Reason for the question
I do have some code that is doing this for me however its not very efficient due to my lack of skill.
In a nutshell it first creates a unique list of ISIN's from the priceList. It then loops through this list for each unique ISIN to get a list of the PriceDetails with the same ISIN then uses some if statements to determine which object I want. So hoping and pretty sure there is a better way to do this.
My Classes
class Cascade
{
int Rank;
string Source;
}
class PriceDetails
{
string ISIN;
string Sedol;
double Price;
string Source;
}
Example
PriceList Cascade
ISIN Source Price Source Rank
BN1 XYZ 100 ABC 1
MGH PLJ 102 XYZ 2
BN1 PLJ 99.5 PLJ 3
BN1 ABC 98
MGH XYZ 102
Result I'm looking for
PriceList
ISIN Source Price
BN1 ABC 98
MGH XYZ 102
For getting the desired result we must do these steps:
Join two lists based on Source property.
Group the last result by ISIN property.
After grouping we must get the minimum rank for
each ISIN.
Then we will use the minRank variable to compare it
against the rank of an elements with the same ISIN and then select
the first element.
We can write this query either with query or method syntax.
With query syntax:
var result = from pr in pricesList
join cas in cascadesList on pr.Source equals cas.Source
select new { pr, cas } into s
group s by new { s.pr.ISIN } into prcd
let minRank = prcd.Min(x => x.cas.Rank)
select prcd.First(y => y.cas.Rank == minRank).pr;
With method syntax:
var result = pricesList.Join(cascadesList,
pr => pr.Source,
cas => cas.Source,
(pr, cas) => new { pr, cas })
.GroupBy(j => j.pr.ISIN)
.Select(g => new { g, MinRank = g.Min(x => x.cas.Rank) })
.Select(r => r.g.First(x => x.cas.Rank == r.MinRank).pr);
Result will be same with both ways:
PriceList
ISIN Source Price
BN1 ABC 98
MGH XYZ 102
P.S: I have assumed that list's name is as following: pricesList and cascadesList
from pr in priceList
join c in cascadeList on pr.Source = c.Source
order by c.Rank
select new {Isin = pr.Isin, Source = pr.Source, Price = pr.Price}
See if this works for you
priceList.GroupBy(p => p.ISIN).OrderByDescending(p =>
cascadeList.FirstOrDefault(c => c.Source == p.Source).Rank).First();

Frequency table with zero counts for all values [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Dictionary returning a default value if the key does not exist
I have a string that contains only digits. I'm interested in generating a frequency table of the digits. Here's an example string:
var candidate = "424256";
This code works, but it throws a KeyNotFound exception if I look up a digit that's not in the string:
var frequencyTable = candidate
.GroupBy(x => x)
.ToDictionary(g => g.Key, g => g.Count());
Which yields:
Key Count
4 2
2 2
5 1
6 1
So, I used this code, which works:
var frequencyTable = (candidate + "1234567890")
.GroupBy(x => x)
.ToDictionary(g => g.Key, g => g.Count() - 1);
However, in other use cases, I don't want to have to specify all the possible key values.
Is there an elegant way of inserting 0-count records into the frequencyTable dictionary without resorting to creating a custom collection with this behavior, such as this?
public class FrequencyTable<K> : Dictionary<K, int>
{
public FrequencyTable(IDictionary<K, int> dictionary)
: base(dictionary)
{ }
public new int this[K index]
{
get
{
if (ContainsKey(index))
return base[index];
return 0;
}
}
}
If you do not somehow specify all possible key values, your dictionary will not contain an entry for such keys.
Rather than storing zero counts, you may wish to use
Dictionary.TryGetValue(...)
to test the existence of the key before trying to access it. If TryGetValue returns false, simply return 0.
You could easily wrap that in an extension method (rather than creating a custom collection).
static public class Extensions
{
static public int GetFrequencyCount<K>(this Dictionary<K, int> counts, K value)
{
int result;
if (counts.TryGetValue(value, out result))
{
return result;
}
else return 0;
}
}
Usage:
Dictionary<char, int> counts = new Dictionary<char, int>();
counts.Add('1', 42);
int count = counts.GetFrequencyCount<char>('1');
If there is a pattern for all the possible keys, you can use Enumerable.Range (or a for loop) to generate 0-value keys as a base table, then left join in the frequency data to populate the relevant values:
// test value
var candidate = "424256";
// generate base table of all possible keys
var baseTable = Enumerable.Range('0', '9' - '0' + 1).Select(e => (char)e);
// generate freqTable
var freqTable = candidate.ToCharArray().GroupBy (c => c);
// left join frequency table results to base table
var result =
from b in baseTable
join f in freqTable on b equals f.Key into gj
from subFreq in gj.DefaultIfEmpty()
select new { Key = b, Value = (subFreq == null) ? 0 : subFreq.Count() };
// convert final result into dictionary
var dict = result.ToDictionary(r => r.Key, r => r.Value);
Sample result:
Key Value
0 0
1 0
2 2
3 0
4 2
5 1
6 1
7 0
8 0
9 0

Get min value in row during LINQ query

I know that I can use .Min() to get minimum value from column, but how to get minimum value in a row?
I have following LINQ query (for testing purposes):
from p in Pravidloes
where p.DulezitostId == 3
where p.ZpozdeniId == 1 || p.ZpozdeniId == 2
where p.SpolehlivostId == 2 || p.SpolehlivostId == 3
group p by p.VysledekId into g
select new {
result = g.Key,
value = g
}
Which results into this:
I would however like to get only the MIN value of following three columns:
DulezitostId, ZpozdeniId, SpolehlivostId as a value in:
select new {
result = g.Key,
value = g // <-- here
}
The final result then should look like:
result: 2, value: 1
result: 3, value: 2
I have been looking for similar questions here and googled for few examples with grouping and aggregating queries, but found nothing that would move me forward with this problem.
Btw: Solution isn't limited to linq, if you know better way how to do it.
You could create an array of the values and do Min on those.
select new {
result = g.Key,
value = g.SelectMany(x => new int[] { x.DulezitostId, x.ZpozdeniId, x.SpolehlivostId }).Min()
}
This will return the min for those 3 values in each grouping for ALL rows of that grouping.
Which would result in something like this...
result: 3, value: 1
The below will select the min for each row in the grouping...
select new {
result = g.Key,
value = g.Select(x => new int[] { x.DulezitostId, x.ZpozdeniId, x.SpolehlivostId }.Min())
}
Which would result in something like this...
result: 3, value: 1, 2
The best solution if you're using straight LINQ is Chad's answer. However, if you're using Linq To SQL it won't work because you can't construct an array like that.
Unfortunately, I believe the only way to do this in Linq To Sql is to use Math.Min repeatedly:
select new {
result = g.Key,
value = Math.Min(Math.Min(DulezitostId, ZpozdeniId), SpolehlivostId)
}
This will generate some ugly CASE WHEN ... statements, but it works.
The main advantage of doing it this way is that you're only returning the data you need from SQL (instead of returning all 3 columns and doing the Min in the application).

Count occurrences of values across multiple columns

I am having a terrible time finding a solution to what I am sure is a simple problem.
I started an app with data in Lists of objects. It's pertinent objects used to look like this (very simplified):
class A {
int[] Nums;
}
and
List<A> myListOfA;
I wanted to count occurrences of values in the member array over all the List.
I found this solution somehow:
var results
from a in myListOfA
from n in a.Nums
group n by n into g
orderby g.Key
select new{ number = g.Key, Occurences = g.Count}
int NumberOfValues = results.Count();
That worked well and I was able to generate the histogram I wanted from the query.
Now I have converted to using an SQL database. The table I am using now looks like this:
MyTable {
int Value1;
int Value2;
int Value3;
int Value4;
int Value5;
int Value6;
}
I have a DataContext that maps to the DB.
I cannot figure out how to translate the previous LINQ statement to work with this. I have tried this:
MyDataContext myContext;
var results =
from d in myContext.MyTable
from n in new{ d.Value1, d.Value2, d.Value3, d.Value4, d.Value5, d.Value6 }
group n by n into g
orderby g.Key
select new { number = g.Key, Occurences = g.Count() };
I have tried some variations on the constructed array like adding .AsQueryable() at the end - something I saw somewhere else. I have tried using group to create the array of values but nothing works. I am a relative newbie when it come to database languages. I just cannot find any clue anywhere on the web. Maybe I am not asking the right question. Any help is appreciated.
I received help on a microsoft site. The problem is mixing LINQ to SQL with LINQ to Objects.
This is how the query should be stated:
var results =
from d in MyContext.MyTable.AsEnumerable()
from n in new[]{d.Value1, d.Value2, d.Value3, d.Value4, d.Value5, d.Value6}
group n by n into g
orderby g.Key
select new {number = g.Key, Occureneces = g.Count()};
Works like a charm.
If you wish to use LINQ to SQL, you could try this "hack" that I recently discovered. It isn't the prettiest most cleanest code, but at least you won't have to revert to using LINQ to Objects.
var query =
from d in MyContext.MyTable
let v1 = MyContext.MyTable.Where(dd => dd.ID == d.ID).Select(dd => dd.Value1)
let v2 = MyContext.MyTable.Where(dd => dd.ID == d.ID).Select(dd => dd.Value2)
// ...
let v6 = MyContext.MyTable.Where(dd => dd.ID == d.ID).Select(dd => dd.Value6)
from n in v1.Concat(v2).Concat(v3).Concat(v4).Concat(v5).Concat(v6)
group 1 by n into g
orderby g.Key
select new
{
number = g.Key,
Occureneces = g.Count(),
};
How about creating your int array on the fly?
var results =
from d in myContext.MyTable
from n in new int[] { d.Value1, d.Value2, d.Value3, d.Value4, d.Value5, d.Value6 }
group n by n into g
orderby g.Key
select new { number = g.Key, Occurences = g.Count() };
In a relational database, such as SQL Server, collections are represented as tables. So you should actually have two tables - Samples and Values. The Keys table would represent a single "A" object, while the Values table would represent each element in A.Nums, with a foreign key pointing to the one of the records in the Samples table. LINQ to SQL
's O/R mapper will then create a "Values" property for each Sample object, which contains a queryable collection of the attached Values. You would then use the following query:
var results =
from sample in myContext.Samples
from value in sample.Values
group value by value into values
orderby values.Key
select new { Value = values.Key, Frequency = values.Count() };

Categories