Classify values on ranges

Classify values on ranges - c#

I have an enumeration as follows:
public enum BPLevel {
Normal = 1,
HighNormal = 2,
HypertensionStage1 = 3,
ModerateHypertensionStage2 = 4,
SeverHypertensionStage3 = 5,
} // BloodPressureLevel
And I have the following classification:
I am using Entity Framework and I need count how many persons are in each level:
IDictionary<BPLevel, Int32> stats = context
.Persons
.Select(x => new { PersonId = x.Person.Id, BPDiastolic = x.BPDiastolic, BPSystolic = x.BPSystolic })
.Count( ...
My problem is how can I apply this classification in my query?

I would just add a classification member that is assigned to the result of a function call
IDictionary<BPLevel, Int32> stats = context
.Persons
.Select(x => new { PersonId = x.Person.Id, BPDiastolic = x.BPDiastolic,
BPSystolic = x.BPSystolic,
Classification = GetClassification(BPDiastolic, BPSystolic) })
.Count( ...
BPLevel GetClassification(int diastolic, int systolic)
{
...
}
Queries to EF sometimes don't like operations happening inside the queries, so you may need to do a ToList before the Select to get it into memory (so its LINQ to objects).

What I'd do in this case, is make a helper property in Person
public BPLevel BpLevel
{
get
{
if(Systolic >= 180)
return BPLevel.SeverHypertensionStage3
else if
...
}
}
and then I'd do a group by
.ToList() // you need to execute against the DB before you call the helper property
.GroupBy(x => x.BPLevel)
.Select(x => /*moar data transformation
x is a collection of Person
x.Key, is the BPLevel*/ )
Make sure that you do that ToList() part, or else you might get a not supported exception when it tries to convert your helper property to SQL

Another option would be to Select into a concrete class that has a BPLevel property getter that does the classification for you:
public class PersonWithBP {
// other properties
public BPLevel BPClassification {
get {
// logic to calculate BPLevel
return bpLevel;
}
}
Now your select becomes
.Select(x => new PersonWithBP() {}

Using a GroupBy statement would allow you to classify everyone into their respective BPLevel, which which point you merely need to perform a ToDictionary and count the people in each category. Thus
IDictionary<BPLevel, Int32> stats = context
.Persons
.Select(x => new { PersonId = x.Person.Id, BPDiastolic = x.BPDiastolic, BPSystolic = x.BPSystolic })
.AsEnumerable() // I'm not completely familiar with Entity Framework, so this line may be necessary to force evaluation to continue in-memory from this point forward
.GroupBy(p => ... // Test which returns a BPLevel)
.ToDictionary(g => g.Key, g => g.Count());

I would write a private static helper function in your class that does the classification for you and insert a call to that function in your projection. Something like:
private static BPLevel ClassifyBP(int diastolic, int systolic) {
// Appropriate switch statement here
}
and then your Select projection looks like:
.Select(x => new { PersonId = x.Person.Id,
BPDiastolic = x.BPDiastolic,
BPSystolic = x.BPSystolic,
BPLevel = ClassifyBP(x.BPDiastolic, x.BPSystolic) })

This is not pretty, but the count query will be executed in database.
var stats = context.Persons
.Select(x => new
{
Level = x.BPDiastolic < 85 && x.BPSystolic < 130
? BPLevel.Normal
: (x.BPDiastolic < 90 && x.BPSystolic < 140
? BPLevel.HighNormal
: (x.BPDiastolic < 100 && x.BPSystolic < 160
? BPLevel.HypertensionStage1)
: (x.BPDiastolic < 110 && x.BPSystolic < 180
? BPLevel.ModerateHypertensionStage2
: BPLevel.SeverHypertensionStage3)))
})
.GroupBy(x => x.Level)
.ToDictionary(x => x.Key, g => g.Count()) // execute in database
.Union(Enum.GetValues(typeof(BPLevel))
.OfType<BPLevel>()
.ToDictionary(x => x, x => 0)) // default empty level
.GroupBy(x => x.Key)
.ToDictionary(x => x.Key, x => x.Sum(y => y.Value)); // combine both

Related

How to query the first entry in each group in NHibernate

The following code using LINQ in NHibernate returns a different result from in-memory LINQ and EF LINQ. What is the correct way to do this in NHibernate? It is fine to use QueryOver if the LINQ version is indeed broken.
using (var session = factory.OpenSession())
using (var transaction = session.BeginTransaction())
{
for (int i = 0; i < 10; ++i)
{
session.Save(new A()
{
X = i % 2,
Y = i / 2,
});
}
transaction.Commit();
}
using (var session = factory.OpenSession())
using (var transaction = session.BeginTransaction())
{
//=====================================
var expected = session.Query<A>()
.ToList() // <-- copy to memory
.GroupBy(a => a.X)
.Select(g => g.OrderBy(y => y.Y).First())
.ToList();
Console.WriteLine(string.Join(" ", expected.Select(a => a.Id)));
//=====================================
var actual = session.Query<A>()
.GroupBy(a => a.X)
.Select(g => g.OrderBy(y => y.Y).First())
.ToList();
Console.WriteLine(string.Join(" ", actual.Select(a => a.Id)));
}
public class A
{
public int Id { get; set; }
public int X { get; set; } // indexed
public int Y { get; set; } // indexed
}
Expected results
1 2
Actual results
1 1
Logged SQL
NHibernate: select (select program_a0_.Id as id1_0_ from "A" program_a0_ order by program_a0_.Y asc limit 1) as col_0_0_ from "A" program_a0_ group by program_a0_.X
The full code is in the bug report Incorrect result when using GroupBy with First
Update 2019-8-9
The query should not use ID. I have changed it to a non-unique property. I would appreciate if the solution only query once to SQLite.

It seems latest NHibernate 5.3 LINQ provider supports only aggregate functions (MIN, MAX, COUNT...) in Select for "group by" query. Entity select is not supported in group by queries.
As a general solution you can rewrite your "group by" query with subquery using the following approach:
var results = session.Query<A>()
.Where(a => a == session.Query<A>() // Subquery on same entity
.Where(sa => sa.X == a.X) // Group BY key is here
.OrderBy(sa => sa.Y) // Order By key is here
.First() // First entry in group
).ToList();
Original "group by" query for reference:
var results = session.Query<A>()
.GroupBy(a => a.X)
.Select(g => g.OrderBy(y => y.Y).First())
.ToList();

How to use GroupBy on an index in RavenDB?

I have this document, a post :
{Content:"blabla",Tags:["test","toto"], CreatedOn:"2019-05-01 01:02:01"}
I want to have a page that displays themost used tags since the last 30 days.
So far I tried to create an index like this
public class Toss_TagPerDay : AbstractIndexCreationTask<TossEntity, TagByDayIndex>
{
public Toss_TagPerDay()
{
Map = tosses => from toss in tosses
from tag in toss.Tags
select new TagByDayIndex()
{
Tag = tag,
CreatedOn = toss.CreatedOn.Date,
Count = 1
};
Reduce = results => from result in results
group result by new { result.Tag, result.CreatedOn }
into g
select new TagByDayIndex()
{
Tag = g.Key.Tag,
CreatedOn = g.Key.CreatedOn,
Count = g.Sum(i => i.Count)
};
}
}
And I query it like that
await _session
.Query<TagByDayIndex, Toss_TagPerDay>()
.Where(i => i.CreatedOn >= firstDay)
.GroupBy(i => i.Tag)
.OrderByDescending(g => g.Sum(i => i.Count))
.Take(50)
.Select(t => new BestTagsResult()
{
CountLastMonth = t.Count(),
Tag = t.Key
})
.ToListAsync()
But this gives me the error
Message: System.NotSupportedException : Could not understand expression: from index 'Toss/TagPerDay'.Where(i => (Convert(i.CreatedOn, DateTimeOffset) >= value(Toss.Server.Models.Tosses.BestTagsQueryHandler+<>c__DisplayClass3_0).firstDay)).GroupBy(i => i.Tag).OrderByDescending(g => g.Sum(i => i.Count)).Take(50).Select(t => new BestTagsResult() {CountLastMonth = t.Count(), Tag = t.Key})
---- System.NotSupportedException : GroupBy method is only supported in dynamic map-reduce queries
Any idea how can I make this work ? I could query for all the index data from the past 30 days and do the groupby / order / take in memory but this could make my app load a lot of data.

The results from the map-reduce index you created will give you the number of tags per day. You want to have the most popular ones from the last 30 days so you need to do the following query:
var tagCountPerDay = session
.Query<TagByDayIndex, Toss_TagPerDay>()
.Where(i => i.CreatedOn >= DateTime.Now.AddDays(-30))
.ToList();
Then you can the the client side grouping by Tag:
var mostUsedTags = tagCountPerDay.GroupBy(x => x.Tag)
.Select(t => new BestTagsResult()
{
CountLastMonth = t.Count(),
Tag = t.Key
})
.OrderByDescending(g => g.CountLastMonth)
.ToList();

#Kuepper
Based on your index definition. You can handle that by the following index:
public class TrendingSongs : AbstractIndexCreationTask<TrackPlayedEvent, TrendingSongs.Result>
{
public TrendingSongs()
{
Map = events => from e in events
where e.TypeOfTrack == TrackSubtype.song && e.Percentage >= 80 && !e.Tags.Contains(Podcast.Tags.FraKaare)
select new Result
{
TrackId = e.TrackId,
Count = 1,
Timestamp = new DateTime(e.TimestampStart.Year, e.TimestampStart.Month, e.TimestampStart.Day)
};
Reduce = results => from r in results
group r by new {r.TrackId, r.Timestamp}
into g
select new Result
{
TrackId = g.Key.TrackId,
Count = g.Sum(x => x.Count),
Timestamp = g.Key.Timestamp
};
}
}
and the query using facets:
from index TrendingSongs where Timestamp between $then and $now select facet(TrackId, sum(Count))

The reason for the error is that you can't use 'GroupBy' in a query made on an index.
'GroupBy' can be used when performing a 'dynamic query',
i.e. a query that is made on a collection, without specifying an index.
See:
https://ravendb.net/docs/article-page/4.1/Csharp/client-api/session/querying/how-to-perform-group-by-query

I solved a similar problem, by using AdditionalSources that uses dynamic values.
Then I update the index every morning to increase the Earliest Timestamp. await IndexCreation.CreateIndexesAsync(new AbstractIndexCreationTask[] {new TrendingSongs()}, _store);
I still have to try it in production, but my tests so far look like it's a lot faster than the alternatives. It does feel pretty hacky though and I'm surprised RavenDB does not offer a better solution.
public class TrendingSongs : AbstractIndexCreationTask<TrackPlayedEvent, TrendingSongs.Result>
{
public DateTime Earliest = DateTime.UtcNow.AddDays(-16);
public TrendingSongs()
{
Map = events => from e in events
where e.TypeOfTrack == TrackSubtype.song && e.Percentage >= 80 && !e.Tags.Contains(Podcast.Tags.FraKaare)
&& e.TimestampStart > new DateTime(TrendingHelpers.Year, TrendingHelpers.Month, TrendingHelpers.Day)
select new Result
{
TrackId = e.TrackId,
Count = 1
};
Reduce = results => from r in results
group r by new {r.TrackId}
into g
select new Result
{
TrackId = g.Key.TrackId,
Count = g.Sum(x => x.Count)
};
AdditionalSources = new Dictionary<string, string>
{
{
"TrendingHelpers",
#"namespace Helpers
{
public static class TrendingHelpers
{
public static int Day = "+Earliest.Day+#";
public static int Month = "+Earliest.Month+#";
public static int Year = "+Earliest.Year+#";
}
}"
}
};
}
}

implement dense rank with linq

Using the following linq code, how can I add dense_rank to my results? If that's too slow or complicated, how about just the rank window function?
var x = tableQueryable
.Where(where condition)
.GroupBy(cust=> new { fieldOne = cust.fieldOne ?? string.Empty, fieldTwo = cust.fieldTwo ?? string.Empty})
.Where(g=>g.Count()>1)
.ToList()
.SelectMany(g => g.Select(cust => new {
cust.fieldOne
, cust.fieldTwo
, cust.fieldThree
}));

This does a dense_rank(). Change the GroupBy and the Order according to your need :)
Basically, dense_rank is numbering the ordered groups of a query so:
var DenseRanked = data.Where(item => item.Field2 == 1)
//Grouping the data by the wanted key
.GroupBy(item => new { item.Field1, item.Field3, item.Field4 })
.Where(#group => #group.Any())
// Now that I have the groups I decide how to arrange the order of the groups
.OrderBy(#group => #group.Key.Field1 ?? string.Empty)
.ThenBy(#group => #group.Key.Field3 ?? string.Empty)
.ThenBy(#group => #group.Key.Field4 ?? string.Empty)
// Because linq to entities does not support the following select overloads I'll cast it to an IEnumerable - notice that any data that i don't want was already filtered out before
.AsEnumerable()
// Using this overload of the select I have an index input parameter. Because my scope of work is the groups then it is the ranking of the group. The index starts from 0 so I do the ++ first.
.Select((#group , i) => new
{
Items = #group,
Rank = ++i
})
// I'm seeking the individual items and not the groups so I use select many to retrieve them. This overload gives me both the item and the groups - so I can get the Rank field created above
.SelectMany(v => v.Items, (s, i) => new
{
Item = i,
DenseRank = s.Rank
}).ToList();
Another way is as specified by Manoj's answer in this question - But I prefer it less because of the selecting twice from the table.

So if I understand this correctly, the dense rank is the index of the group it would be when the groups are ordered.
var query = db.SomeTable
.GroupBy(x => new { x.Your, x.Key })
.OrderBy(g => g.Key.Your).ThenBy(g => g.Key.Key)
.AsEnumerable()
.Select((g, i) => new { g, i })
.SelectMany(x =>
x.g.Select(y => new
{
y.Your,
y.Columns,
y.And,
y.Key,
DenseRank = x.i,
}
);

var denseRanks = myDb.tblTestReaderCourseGrades
.GroupBy(x => new { x.Grade })
.OrderByDescending(g => g.Key.Grade)
.AsEnumerable()
.Select((g, i) => new { g, i })
.SelectMany(x =>
x.g.Select(y => new
{
y.Serial,
Rank = x.i + 1,
}
));

Lambda not equal on join

Table 1 called Category contains 70 records
Table 2 called FilterCategorys contains 0 records (currently).
my lambda join, I want to pull only records that don't match, so in this case I expect to get 70 records back. Here's my incorrect Lambda:
var filteredList = categorys
.Join(filterCategorys,
x => x.Id,
y => y.CategoryId,
(x, y) => new { catgeory = x, filter = y })
.Where(xy => xy.catgeory.Id != xy.filter.CategoryId)
.Select(xy => new Category()
{
Name = xy.catgeory.Name,
Id = xy.catgeory.Id,
ParentCategoryId = xy.catgeory.ParentCategoryId
})
.ToList();
Whats the correct syntax I need here?

Not sure if you have a requirement of using lambdas (rather than query syntax), but I prefer query syntax for statements that have outer joins.
This should be equivalent:
var filteredList = (
from c in Categorys
join fc in FilterCategorys on c.Id equals fc.CategoryId into outer
from o in outer.DefaultIfEmpty()
select new
{
Category = new Category
{
Name = c.Name,
Id = c.Id,
ParentCategoryId = c.ParentCategoryId
},
Exists = (o != null)
})
.Where(c => !c.Exists)
.Select(c => c.Category);

If you want to do it in purely lambda:
var match = categorys.Join(filterCategorys, x => x.Id, y => y.CategoryId, (x, y) => new { Id = x.Id });
var filteredList = categorys.Where(x => !match.Contains(new {Id = x.Id}));
I haven't measured the performance of this, but for 70 records, optimization is not an issue.

Well I came up with a solution that takes away the need for the join.
var currentIds = filterCategorys.Select(x => x.Id).ToList();
var filteredList = categorys.Where(x => !currentIds.Contains(x.Id));
very similar to #Zoff Dino answer, not sure about performance, maybe someone would like to check.

Try this:
var categories= ...
var filteredCategories=...
var allExceptFiltered = categories.Except(filteredCategories, new CategoryComparer()).ToList();
If you don't provide a custom Comparer that framework has no way of knowing that 2 Category objects are the same(even if they have the same ID),it just thinks that they are different objects (it checks for reference equality )
so you must add this class to your project:
public class CategoryComparer: IEqualityComparer<Category>
{
public bool Equals(Category x, Category y)
{
if (x == null && y == null)
return true;
if (x == null)
return false;
if (y == null)
return false;
return x.CategoryId.GetHashCode() == y.CategoryId.GetHashCode();
}
public int GetHashCode(Category obj)
{
return obj.CategoryId.GetHashCode();
}
}
update
Also check out Wyatt Earp's answer,it is very useful to know how to do an outer join
update 2
Your problem is the Join method.
The Where clause is "called" after the join.so after you have joined the listed based on the ID you select those which have different IDs,that's why you get no resuts

Could you draw bracket and it should work.
....Where(xy => (xy.catgeory.Id != xy.filter.CategoryId))

Group By Multiple Columns

How can I do GroupBy multiple columns in LINQ
Something similar to this in SQL:
SELECT * FROM <TableName> GROUP BY <Column1>,<Column2>
How can I convert this to LINQ:
QuantityBreakdown
(
MaterialID int,
ProductID int,
Quantity float
)
INSERT INTO #QuantityBreakdown (MaterialID, ProductID, Quantity)
SELECT MaterialID, ProductID, SUM(Quantity)
FROM #Transactions
GROUP BY MaterialID, ProductID

Use an anonymous type.
Eg
group x by new { x.Column1, x.Column2 }

Procedural sample:
.GroupBy(x => new { x.Column1, x.Column2 })

Ok got this as:
var query = (from t in Transactions
group t by new {t.MaterialID, t.ProductID}
into grp
select new
{
grp.Key.MaterialID,
grp.Key.ProductID,
Quantity = grp.Sum(t => t.Quantity)
}).ToList();

For Group By Multiple Columns, Try this instead...
GroupBy(x=> new { x.Column1, x.Column2 }, (key, group) => new
{
Key1 = key.Column1,
Key2 = key.Column2,
Result = group.ToList()
});
Same way you can add Column3, Column4 etc.

Since C# 7 you can also use value tuples:
group x by (x.Column1, x.Column2)
or
.GroupBy(x => (x.Column1, x.Column2))

C# 7.1 or greater using Tuples and Inferred tuple element names (currently it works only with linq to objects and it is not supported when expression trees are required e.g. someIQueryable.GroupBy(...). Github issue):
// declarative query syntax
var result =
from x in inMemoryTable
group x by (x.Column1, x.Column2) into g
select (g.Key.Column1, g.Key.Column2, QuantitySum: g.Sum(x => x.Quantity));
// or method syntax
var result2 = inMemoryTable.GroupBy(x => (x.Column1, x.Column2))
.Select(g => (g.Key.Column1, g.Key.Column2, QuantitySum: g.Sum(x => x.Quantity)));
C# 3 or greater using anonymous types:
// declarative query syntax
var result3 =
from x in table
group x by new { x.Column1, x.Column2 } into g
select new { g.Key.Column1, g.Key.Column2, QuantitySum = g.Sum(x => x.Quantity) };
// or method syntax
var result4 = table.GroupBy(x => new { x.Column1, x.Column2 })
.Select(g =>
new { g.Key.Column1, g.Key.Column2 , QuantitySum= g.Sum(x => x.Quantity) });

You can also use a Tuple<> for a strongly-typed grouping.
from grouping in list.GroupBy(x => new Tuple<string,string,string>(x.Person.LastName,x.Person.FirstName,x.Person.MiddleName))
select new SummaryItem
{
LastName = grouping.Key.Item1,
FirstName = grouping.Key.Item2,
MiddleName = grouping.Key.Item3,
DayCount = grouping.Count(),
AmountBilled = grouping.Sum(x => x.Rate),
}

Though this question is asking about group by class properties, if you want to group by multiple columns against a ADO object (like a DataTable), you have to assign your "new" items to variables:
EnumerableRowCollection<DataRow> ClientProfiles = CurrentProfiles.AsEnumerable()
.Where(x => CheckProfileTypes.Contains(x.Field<object>(ProfileTypeField).ToString()));
// do other stuff, then check for dups...
var Dups = ClientProfiles.AsParallel()
.GroupBy(x => new { InterfaceID = x.Field<object>(InterfaceField).ToString(), ProfileType = x.Field<object>(ProfileTypeField).ToString() })
.Where(z => z.Count() > 1)
.Select(z => z);

var Results= query.GroupBy(f => new { /* add members here */ });

A thing to note is that you need to send in an object for Lambda expressions and can't use an instance for a class.
Example:
public class Key
{
public string Prop1 { get; set; }
public string Prop2 { get; set; }
}
This will compile but will generate one key per cycle.
var groupedCycles = cycles.GroupBy(x => new Key
{
Prop1 = x.Column1,
Prop2 = x.Column2
})
If you wan't to name the key properties and then retreive them you can do it like this instead. This will GroupBy correctly and give you the key properties.
var groupedCycles = cycles.GroupBy(x => new
{
Prop1 = x.Column1,
Prop2= x.Column2
})
foreach (var groupedCycle in groupedCycles)
{
var key = new Key();
key.Prop1 = groupedCycle.Key.Prop1;
key.Prop2 = groupedCycle.Key.Prop2;
}

group x by new { x.Col, x.Col}

.GroupBy(x => (x.MaterialID, x.ProductID))

.GroupBy(x => x.Column1 + " " + x.Column2)

For VB and anonymous/lambda:
query.GroupBy(Function(x) New With {Key x.Field1, Key x.Field2, Key x.FieldN })

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Classify values on ranges - c#

Related

How to query the first entry in each group in NHibernate

How to use GroupBy on an index in RavenDB?

implement dense rank with linq

Lambda not equal on join

Group By Multiple Columns

Categories

Resources