Entity Framework - Get row count of a (group by Subquery)

Entity Framework - Get row count of a (group by Subquery) - c#

I have this simple expression to get each Order's amount:
public IQueryable<Orders> GetAccountSummery()
{
return context.Orders.GroupBy(a => new { orderNo = a.orderNo })
.Select(b => new
{
orderNo = b.Key.orderNo,
amount = b.Sum(r => r.amount)
});
}
I needed to Get The total number of records returned by the previous expression:
SQL
select COUNT(1) from
(
SELECT orderNo,SUM(amount) Amount
FROM Orders
group by orderNo
)tbl -- I get 125,000 row count here
EF
public int GetOrdersCount()
{
return GetAccountSummery().Count(); // This guy here gives 198,000 rows which counts all rows from orders table
// The following line gives the correct row count:
return GetAccountSummery().AsEnumerable().Count(); // 125,000 row
}
The problem with GetAccountSummery().AsEnumerable().Count() is that it runs the query first at the server side then calculates the correct row count at client side (consider the table size here)
Is there any way to get only the correct count without executing the select statement ?
EDIT
If that is not possible with groupBy subquery, why is it for Where subs?

The way you currently have it structured? No. You'll always execute the .Select()statement because .Count() employs immediate execution (see here for a list). However, .Count() also has an override which takes in a Func, which you should be able to use to grab the count without having to perform the select first, as so:
context.Orders.GroupBy(a => new { orderNo = a.orderNo })
.Count(a => a.Key.orderNo != string.Empty);
EDIT: Whoops, forgot that it was a bool. Edited accordingly.
EDIT 2: Per the comments, I don't think it is possible, since you'll always need to have that select in there at any given time, and .Count() will always call it. Sorry.

Related

How to convert list of string into list of number then apply OrderByDescending on that list in C#

Consider this as the values in column Emp_code.
E1000
E1001
E9000
E4000
E1339
E10000
I'm using this code to first remove the E from all of the occurrences than convert them into number than apply OrderByDescending to the list.
var idd = db?.HRMS_EmpMst_TR?.Where(a => a.Emp_code != null)?
.Select(x=>x.Emp_code.Remove(0,1)).Select(int.Parse).OrderByDescending(y => y).First();
Can somebody help me with this code. I want to get 10000 as the answer.
Thanks for the help!

You need to
Use TrimStart('E') to remove E char from each string and parse it to
integer.
Get Max value from the processed sequence.
var input = new List<string>(){"E1000", "E1001", "E9000", "E4000", "E1339"};
var result = input
.Select(x => int.Parse(x.TrimStart('E'))) //Remove E and then parse string to integer
.Max(); //Get max value from an IEnumerable
Try Online: .NET Fiddle

You didn't say so, but I think you are working with a database, so you are working IQueryable, and not IEnumerable. This means that you can't use methods like String.TrimStart nor String.Parse.
So you have something called db, of which you didn't bother to tell us what it is. I assume it is a DbContext or something similar to access a database management system.
This DbContext has a table HRMS_EmpMst_TR, filled with rows of which I don't know what they are (please, next time give us some more information!). What I do know, that there are no null rows in this table. So your Where is meaningless.
By the way, are you not certain that db is not null?
if (db == null) return null;
After this, we know that db.HRMS_EmpMst_TR is a non null possible empty sequence of rows, where every row has a string column EmpCode. Every EmpCode starts with the character E followed by a four digits number. You want the EmpCode with the largest number.
string largestEmpCode = db.HRMS_EmpMst_TR
.OrderByDescending(row => row.EmpCode)
.Select(row => row.EmpCode)
.FirstOrDefault();
You get the string E9000, or null, if the table is empty. If you want 9000 just remove the first character and parse. What do you want if the table is empty?
if (largestEmpCode != null)
{
int largestEmpCodeValue = Int32.Parse(largestEmpCode.SubString(1));
}
else
{
// TODO: handle empty table.
}
There is room for improvement
If you are certain that every EmpCode is the character E followed by a four digit number, and you want to do calculations with this number, consider to change the EmpCode column to an integer column, without the E. This is a one time action, and it will make future calculations much easier.
Database column:
int EmpCodeValue;
LINQ to get the largest EmpCodeValue:
int largestEmpCodeValue = db.HRMS_EmpMst_TR
.Select(row => row.EmpCodeValue)
.Max();
If other parts of your application really need an "E followed by four digits", you can always make an extension method. I don't know what HRMS_EmpMst_TR are, let's assume it is a table of EmpMst
public string GetEmpCode(this EmpMst empMst)
{
return String.Format("E{0,04}", empMst.EmpCode);
}
I'm not sure about the ,04 part. You'll have to look it up, how to convert integer 4 to string "0004"
Usage:
List<EmpMst> fetchedEmpMsts = ...
string firstEmpCode = fetchedEmpMsts[0].GetEmpCode();
Or:
var result = db.HRMS_EmpMst_TR
.Where(empMst => empMst.Name == ...) // or use some other filter, just an example
.AsEnumerable()
.Select(empMst => new
{
Id = empMst.Id,
Name = empMst.Name,
EmpCode = empMst.GetEmpCode(),
...
});

Why is Entity Framework having performance issues when calculating a sum

I am using Entity Framework in a C# application and I am using lazy loading. I am experiencing performance issues when calculating the sum of a property in a collection of elements. Let me illustrate it with a simplified version of my code:
public decimal GetPortfolioValue(Guid portfolioId) {
var portfolio = DbContext.Portfolios.FirstOrDefault( x => x.Id.Equals( portfolioId ) );
if (portfolio == null) return 0m;
return portfolio.Items
.Where( i =>
i.Status == ItemStatus.Listed
&&
_activateStatuses.Contains( i.Category.Status )
)
.Sum( i => i.Amount );
}
So I want to fetch the value for all my items that have a certain status of which their parent has a specific status as well.
When logging the queries generated by EF I see it is first fetching my Portfolio (which is fine). Then it does a query to load all Item entities that are part of this portfolio. And then it starts fetching ALL Category entities for each Item one by one. So if I have a portfolio that contains 100 items (each with a category), it literally does 100 SELECT ... FROM categories WHERE id = ... queries.
So it seems like it's just fetching all info, storing it in its memory and then calculating the sum. Why does it not do a simple join between my tables and calculate it like that?
Instead of doing 102 queries to calculate the sum of 100 items I would expect something along the lines of:
SELECT
i.id, i.amount
FROM
items i
INNER JOIN categories c ON c.id = i.category_id
WHERE
i.portfolio_id = #portfolioId
AND
i.status = 'listed'
AND
c.status IN ('active', 'pending', ...);
on which it could then calculate the sum (if it is not able to use the SUM directly in the query).
What is the problem and how can I improve the performance other than writing a pure ADO query instead of using Entity Framework?
To be complete, here are my EF entities:
public class ItemConfiguration : EntityTypeConfiguration<Item> {
ToTable("items");
...
HasRequired(p => p.Portfolio);
}
public class CategoryConfiguration : EntityTypeConfiguration<Category> {
ToTable("categories");
...
HasMany(c => c.Products).WithRequired(p => p.Category);
}
EDIT based on comments:
I didn't think it was important but the _activeStatuses is a list of enums.
private CategoryStatus[] _activeStatuses = new[] { CategoryStatus.Active, ... };
But probably more important is that I left out that the status in the database is a string ("active", "pending", ...) but I map them to an enum used in the application. And that is probably why EF cannot evaluate it? The actual code is:
... && _activateStatuses.Contains(CategoryStatusMapper.MapToEnum(i.Category.Status)) ...
EDIT2
Indeed the mapping is a big part of the problem but the query itself seems to be the biggest issue. Why is the performance difference so big between these two queries?
// Slow query
var portfolio = DbContext.Portfolios.FirstOrDefault(p => p.Id.Equals(portfolioId));
var value = portfolio.Items.Where(i => i.Status == ItemStatusConstants.Listed &&
_activeStatuses.Contains(i.Category.Status))
.Select(i => i.Amount).Sum();
// Fast query
var value = DbContext.Portfolios.Where(p => p.Id.Equals(portfolioId))
.SelectMany(p => p.Items.Where(i =>
i.Status == ItemStatusConstants.Listed &&
_activeStatuses.Contains(i.Category.Status)))
.Select(i => i.Amount).Sum();
The first query does a LOT of small SQL queries whereas the second one just combines everything into one bigger query. I'd expect even the first query to run one query to get the portfolio value.

Calling portfolio.Items this will lazy load the collection in Items and then execute the subsequent calls including the Where and Sum expressions. See also Loading Related Entities article.
You need to execute the call directly on the DbContext the Sum expression can be evaluated database server side.
var portfolio = DbContext.Portfolios
.Where(x => x.Id.Equals(portfolioId))
.SelectMany(x => x.Items.Where(i => i.Status == ItemStatus.Listed && _activateStatuses.Contains( i.Category.Status )).Select(i => i.Amount))
.Sum();
You also have to use the appropriate type for _activateStatuses instance as the contained values must match the type persisted in the database. If the database persists string values then you need to pass a list of string values.
var _activateStatuses = new string[] {"Active", "etc"};
You could use a Linq expression to convert enums to their string representative.
Notes
I would recommend you turn off lazy loading on your DbContext type. As soon as you do that you will start to catch issues like this at run time via Exceptions and can then write more performant code.
I did not include error checking for if no portfolio was found but you could extend this code accordingly.

Yep CategoryStatusMapper.MapToEnum cannot be converted to SQL, forcing it to run the Where in .Net. Rather than mapping the status to the enum, _activeStatuses should contain the list of integer values from the enum so the mapping is not required.
private int[] _activeStatuses = new[] { (int)CategoryStatus.Active, ... };
So that the contains becomes
... && _activateStatuses.Contains(i.Category.Status) ...
and can all be converted to SQL
UPDATE
Given that i.Category.Status is a string in the database, then
private string[] _activeStatuses = new[] { CategoryStatus.Active.ToString(), ... };

Select a row only if a field from previous row is less from a field in the active row

I would like to know if its possible to create a query which will select a row only if a field in the previous row has a value smaller then the value in the actual row.
If you check this screenshot:
The ID which equals 7 will not be selected because the row after it has value which is less then the actual value. So I would like to know if there is a LINQ to Entities command which will help me to exclude the rows such as row with id of 7 from the select results.

If your ids are guaranteed to be consecutive, you can join the table to itself by id onto id -1
var q = from x in test
join y in test on x.ID equals y.ID - 1
where y.StopOrder >= y.ID
select x;
You'd have to think about boundary conditions, you might want the equivalent of a left join.
If your ids are not consecutive, you can do something like:
var q = from x in test
from y in test
where y.ID > x.ID
group y by x into g
where g.Min().ID <= g.Min().StopOrder
select g.Key;
For this to work you need to define IComparable on the table type. In my test I used:
struct X: IComparable<X> {
public int ID;
public int StopOrder;
public int CompareTo (X other) {
return ID.CompareTo(other.ID);
}
}
This will still never return the last row.
If you're using SQL2012 or higher and want to drop down to SQL, you can use the lead function. This will only scan the table once (assuming an index on ID):
with x as (
select
t.ID,
t.StopOrder,
lead(id, 1) over (order by id) as NextID,
lead(StopOrder, 1) over (order by id) as NextStopOrder
from
test t
) select
x.ID,
x.StopOrder
from
x
where
x.NextId <= x.NextStopOrder;
Also, if you want the last row, you can just add or x.NextID is null to the end
Example SQLFiddle

As SQL is declarative, the DBMS can make plenty of decisions about how it retrieves the result, and this may well involve multithreading.
When this occurs, there is no guarantee about the order in which the threads will complete, and therefore, the order in which the results will be returned - even if you have a primary key defined. This article is a good explanation of what goes on there.
Therefore, I'd recommend getting the consecutive numbering sorted by the database, but ensuring that the results are then ordered once they're back from the database, and then carrying out the filtering on the ordered result set in C#.

If the table is not too big to pull all records, it might be easier to write an extension method that does the filtering (sorry, I didn't quite understand the condition you meant):
public static class MyTypeExtensions
{
public static IEnumerable<MyType> FilterOnStopOrder(this IEnumerable<MyType> source)
{
MyType previous = null;
foreach (var item in source.OrderBy(s => s.ID))
{
// or whatever condition...
if (previous != null && previous.StopOrder < item.StopOrder)
{
yield return item;
}
previous = item;
}
}
}

LINQ2SQL select rows based on large where

I'm searching for a bunch of int32's in a SQL (Compact edition) database using LINQ2SQL.
My main problem is that I have a large list (thousands) of int32 and I want all records in the DB where id field in DB matches any of my int32's. Currently I'm selecting one row at the time, effectively searching the index thousands of times.
How can I optimize this? Temp table?

This sounds like you could use a Contains query:
int[] intArray = ...;
var matches = from item in context.SomeTable
where intArray.Contains(item.id)
select item;

For serarching for thousands of values, your options are:
Send an XML block to a stored procedure (complex, but doable)
Create a temp table, bulk upload the data, then join onto it (can cause problems with concurrency)
Execute multiple queries (i.e. break your group of IDs into chunks of a thousand or so and use BrokenGlass's solution)
I'm not sure which you can do with Compact Edition.

Insert your ints in a SQL table then do :
var items = from row in table
join intRow in intTable on row.TheIntColumn equals intRow.IntColumn
select row;
Edit 1 & 2: Changed the answer so he joins 2 tables, no collections.

My Preference would be to writing a Stored Procedure for the search. If you have an Index on the field that you are searching, It would make life a lot easier for you in the future when the amount of rows to process increases.
The complexity you will come across is writing a select statement that can do an IN Clause from an input parameter. What you need is to have a Table-Valued function to convert the string (of Id's) into a Column and use that column in the IN Clause.
like:
Select *
From SomeTable So
Where So.ID In (Select Column1 From dbo.StringToTable(InputIds))

I've come up with this linq solution after being tired of writing manual batching code.
It's not perfect (i.e. the batches are not exactly perfect) but it solves the problem.
Very useful when you are not allowed to write stored procs or sql functions. Works with almost every linq expression.
Enjoy:
public static IQueryable<TResultElement> RunQueryWithBatching<TBatchElement, TResultElement>(this IList<TBatchElement> listToBatch, int batchSize, Func<List<TBatchElement>, IQueryable<TResultElement>> initialQuery)
{
return RunQueryWithBatching(listToBatch, initialQuery, batchSize);
}
public static IQueryable<TResultElement> RunQueryWithBatching<TBatchElement, TResultElement>(this IList<TBatchElement> listToBatch, Func<List<TBatchElement>, IQueryable<TResultElement>> initialQuery)
{
return RunQueryWithBatching(listToBatch, initialQuery, 0);
}
public static IQueryable<TResultElement> RunQueryWithBatching<TBatchElement, TResultElement>(this IList<TBatchElement> listToBatch, Func<List<TBatchElement>, IQueryable<TResultElement>> initialQuery, int batchSize)
{
if (listToBatch == null)
throw new ArgumentNullException("listToBatch");
if (initialQuery == null)
throw new ArgumentNullException("initialQuery");
if (batchSize <= 0)
batchSize = 1000;
int batchCount = (listToBatch.Count / batchSize) + 1;
var batchGroup = listToBatch.AsQueryable().Select((elem, index) => new { GroupKey = index % batchCount, BatchElement = elem }); // Enumerable.Range(0, listToBatch.Count).Zip(listToBatch, (first, second) => new { GroupKey = first, BatchElement = second });
var keysBatchGroup = from obj in batchGroup
group obj by obj.GroupKey into grouped
select grouped;
var groupedBatches = keysBatchGroup.Select(key => key.Select((group) => group.BatchElement));
var map = from employeekeysBatchGroup in groupedBatches
let batchResult = initialQuery(employeekeysBatchGroup.ToList()).ToList() // force to memory because of stupid translation error in linq2sql
from br in batchResult
select br;
return map;
}
usage:
using (var context = new SourceDataContext())
{
// some code
var myBatchResult = intArray.RunQueryWithBatching(batch => from v1 in context.Table where batch.Contains(v1.IntProperty) select v1, 2000);
// some other code that makes use of myBatchResult
}
then either use result, either expand to list, or whatever you need. Just make sure you don't lose the DataContext reference.

Linq Union: How to add a literal value to the query?

I need to add a literal value to a query. My attempt
var aa = new List<long>();
aa.Add(0);
var a = Products.Select(p => p.sku).Distinct().Union(aa);
a.ToList().Dump(); // LinqPad's way of showing the values
In the above example, I get an error:
"Local sequence cannot be used in LINQ to SQL implementation
of query operators except the Contains() operator."
If I am using Entity Framework 4 for example, what could I add to the Union statement to always include the "seed" ID?
I am trying to produce SQL code like the following:
select distinct ID
from product
union
select 0 as ID
So later I can join the list to itself so I can find all values where the next highest value is not present (finding the lowest available ID in the set).
Edit: Original Linq Query to find lowest available ID
var skuQuery = Context.Products
.Where(p => p.sku > skuSeedStart &&
p.sku < skuSeedEnd)
.Select(p => p.sku).Distinct();
var lowestSkuAvailableList =
(from p1 in skuQuery
from p2 in skuQuery.Where(a => a == p1 + 1).DefaultIfEmpty()
where p2 == 0 // zero is default for long where it would be null
select p1).ToList();
var Answer = (lowestSkuAvailableList.Count == 0
? skuSeedStart :
lowestSkuAvailableList.Min()) + 1;
This code creates two SKU sets offset by one, then selects the SKU where the next highest doesn't exist. Afterward, it selects the minimum of that (lowest SKU where next highest is available).
For this to work, the seed must be in the set joined together.

Your problem is that your query is being turned entirely into a LINQ-to-SQL query, when what you need is a LINQ-to-SQL query with local manipulation on top of it.
The solution is to tell the compiler that you want to use LINQ-to-Objects after processing the query (in other words, change the extension method resolution to look at IEnumerable<T>, not IQueryable<T>). The easiest way to do this is to tack AsEnumerable() onto the end of your query, like so:
var aa = new List<long>();
aa.Add(0);
var a = Products.Select(p => p.sku).Distinct().AsEnumerable().Union(aa);
a.ToList().Dump(); // LinqPad's way of showing the values

Up front: not answering exactly the question you asked, but solving your problem in a different way.
How about this:
var a = Products.Select(p => p.sku).Distinct().ToList();
a.Add(0);
a.Dump(); // LinqPad's way of showing the values

You should create database table for storing constant values and pass query from this table to Union operator.
For example, let's imagine table "Defaults" with fields "Name" and "Value" with only one record ("SKU", 0).
Then you can rewrite your expression like this:
var zero = context.Defaults.Where(_=>_.Name == "SKU").Select(_=>_.Value);
var result = context.Products.Select(p => p.sku).Distinct().Union(zero).ToList();

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Entity Framework - Get row count of a (group by Subquery) - c#

Related

How to convert list of string into list of number then apply OrderByDescending on that list in C#

Why is Entity Framework having performance issues when calculating a sum

Select a row only if a field from previous row is less from a field in the active row

LINQ2SQL select rows based on large where

Linq Union: How to add a literal value to the query?

Categories

Resources