How to get item value and item count in linq c# - c#

I have an sql database table named hate,
I want to get each items name and its count by linq query
that is my codes:
var qLocation = (from L in db.Hato
where L.HatoRecDate >= startDate && L.HatoRecDate <= endDate
group L by L.HatoLocation into g
select new { HatoLocation = g.Key, count = g.Count() })
.OrderByDescending(o => o.count).ToList();
var l = qLocation[0].HatoLocation;
var c = qLocation[0].count;
It gives me item name; but shows 0 result for any item count
please, tell me where is wrong with my code?
Update
After feedback I have captured the following output, what is interesting is that it is only ever the last record in the set that has a zero count:

Your code looks OK, I see no syntax issues with the query itself, what you need is a few tricks that will help you debug this.
When you run this with an In-Memory record set it behaves as expected, this means that the issue is in the generated SQL that your Linq query is translated into via the DbContext.
As a proof for your In-Memory, review this fiddle: https://dotnetfiddle.net/Widget/jxKNG5
Although it is not good practice for production code, one way to work around, and prove this issue is a SQL issue is by reading the data into memory before executing the group by. The results of an IQueryable<T> expression can be loaded into memory using .ToList().
Rather than calling .ToList() on the entire table, if the filter conditions are not in question, call .ToList() after the filter criteria. If you accidentally leave this in your code after your debug session it is going to have less impact than if you were reading every record from the database
#region A safer way to bring the recordset into memory for debugging
// Build the query in 2 steps, first create the filtered query
var filteredHatoQuery = from L in db.Hato
where L.HatoRecDate >= startDate && L.HatoRecDate <= endDate
select L;
// you could also consider only projecting the columns you need
// select new { L.HatoRecDate, L.HatoLocation };
// then operate on the data
var qLocation = (from L in filteredHatoQuery.ToList() // remove the .ToList() to query against the DB
group L by L.HatoLocation into g
select new { HatoLocation = g.Key, count = g.Count() })
.OrderByDescending(o => o.count).ToList();
#endregion A safer way to bring the recordset into memory for debugging
To be honest, I had a really hard time re-creating a query where you could possibly get a Count() of zero. Zero items means no records in the group, which would normally prevent the group header from returning at all, in fact I tried a lot of different angles to this, and really can't figure it out.
There are two complicating factors for manually debugging a query like this:
Linq / C# group by is vastly different to SQL GROUP BY. In C# grouping simply splits the results into sub-arrays, all the records are still in the output, but in SQL the GROUP BY doesn't return all the records, it only returns the aggregate group results. To do this properly, the grouping should be realised in SQL as a nested query, it won't necessarily always involve a SQL GROUP BY.
Either way, the resulting SQL will NOT be as simple as this:
SELECT HatoLocation, COUNT(*)
FROM Hato
WHERE HatoRecDate >= '2021-05-21' AND HatoRecDate <= '2021-05-24'
GROUP BY HatoLoction
You are ordering by the results of an aggregate within a filter. This is not always a big deal, but it can often lead to complications in SQL if you are not also using a limiting factor like TOP. As a general proposition, if the sorting only affects the rendered output, and not the functional logic, then you should leave the sort process to the renderer. Or at the very least, sort In-Memory, not in the SQL.
The original query would evaluate into SQL similar to this:
(I have substituted the Start and end parameters #p_linq_0 and #p_linq_1)
SELECT
[Project1].[C2] AS [C1],
[Project1].[HatoLocation] AS [HatoLocation],
[Project1].[C1] AS [C2]
FROM ( SELECT
[GroupBy1].[A1] AS [C1],
[GroupBy1].[K1] AS [HatoLocation],
1 AS [C2]
FROM ( SELECT
[Extent1].[HatoLocation] AS [K1],
COUNT(1) AS [A1]
FROM [dbo].[Hato] AS [Extent1]
WHERE ([Extent1].[HatoRecDate] >= '2021-05-21') AND ([Extent1].[HatoRecDate] <= '2021-05-24')
GROUP BY [Extent1].[HatoLocation]
) AS [GroupBy1]
) AS [Project1]
ORDER BY [Project1].[C1] DESC
But even that is not going to result in a count of zero. I can only assume that OPs runtime environment or database introduces some other factor that has not been taken into account for this exploration.
In Linq to Entities you can get the resulting SQL for queries that have not been read into memory simply by calling .ToString() on the query, or by using the inspector tool during a debug session. There is a good discussion in this post Get SQL query from LINQ to SQL?
For debugging purposes, it is a good idea to separate the linq query from the resulting enumerated or In-Memory result set, also in this example we have specifically isolated out the sort to occur after the .ToList() and the SQL has been written to the debug output.
var qLocationQuery = from L in db.Hato
where L.HatoRecDate >= startDate && L.HatoRecDate <= endDate
group L by L.HatoLocation into g
select new { HatoLocation = g.Key, count = g.Count() };
System.Diagnostics.Debug.WriteLine("Hato Query SQL:");
System.Diagnostics.Debug.WriteLine(qLocationQuery.ToString());
var qLocation = qLocationQuery.ToList();
// now perform the sort, this simulates leaving the sort to the rendering logic.
qLocation = qLocation.OrderByDescending(o => o.count).ToList();
Please update your post with the resulting SQL so we can further explore this!
Update
I've updated the fiddle with an actual DbContext implementation, I still cannot produce a grouping with a count of zero.
https://dotnetfiddle.net/G4RvUV
This shows how to extract the SQL query, but it shows there is something else wrong with your code. We either need to see more of the data, more of the schema, or a copy of the data without the grouping (as shown in the fiddle) so we can provide more assistance.

Try this...
Do the .ToList() and after that do the group by.

Related

Multiple Where vs Inner Join

I have a filter where depending on the user selection I conditionally add in more Where/Joins.
Which method is faster than the other and why?
Example with Where:
var queryable = db.Sometable.Where(x=> x.Id > 30);
queryable = queryable.Where(x=> x.Name.Contains('something'));
var final = queryable.ToList();
Example with Join:
var queryable1 = db.Sometable.Where(x=> x.Id > 30);
var queryable2 = db.Sometable.Where(x=> x.Name.Contains('something'));
var final = (from q1 in queryable1 join q2 in queryable2 on q1.Id equals q2.Id select q1).ToList();
NOTE: I would have preferred the multiple Where but it is causing error as described in a question. Hence had to shift to JOIN. Hope 'JOIN' code is not slower than multiple WHERE
I just tried running similar linq statements against an MSsql 2008 database table with 10million rows. I found that the query optimizer converted both statements into similar query plans and the performance difference was a wash.
I would say that as someone who is reading the code, the first example more clearly states your intentions, and therefore would be preferred. Many times performance is not the best metric to choose when evaluating code.
i whould go for the where clause, avoiding to self joining the same table and make the code clearer
you can add a log to your dbcontext to see the generated sql query
db.context.Database.Log = System.Diagnostic.Debug.WriteLine;
anyway to improve the performance of the query i would :
select ONLY the fields that you actually need (not *)
check the indexes of the table
do you really need the contains statement ? if the records grow a lot you will have performance issue with sql as "like '%XXX%'"
I'm sure you already understand that LINQ converts your code into a SQL statement. Your first query would result in something like:
SELECT * FROM Sometable WHERE Id > 30 AND Name LIKE '%something%'
Your second query would result in something like
SELECT q1.*
FROM Sometable q1
JOIN Sometable q2 ON q1.Id = q2.Id
WHERE q1.Id > 30 AND q2.Name LIKE '%something%')
Nearly every time, a select from a single will return results faster than a join between 2 tables.
If you LINQ statement is failing to add tables, be sure you are including them.
var queryable = db.Sometable.Include(i => i.ForeignTable).Where(x=> x.Id > 30);

Why is this query resolving to a DataQuery

I have a linq to sql query that gets all my logs for the current hour(Stored as an Iqueryable):
currentLogs = from dll in cDataContext.DownloadLogs
where dll.DTS.Hour == DateTime.Now.Hour
select dll
And then I have another query(also stored as an Iqueryable) that gets the logs that are currently being processed, and dont appear in the logs for that time slot.
notDownloadedIds = (from x in cDataContext.CategoryCountryCategoryTypeMappings
where !(
from dll in currentLogs
select dll.CategoryCountryCategoryTypeMappingID)
.Contains(x.CategoryCountryCategoryTypeMappingID)
select x);
When I debug and hover over currentLogs i see a sql query, when i hover over notDownloadedIDs i see a DataQuery. If i refactor NotDownloadedIDs to not use current logs, notDownloadedIds stays as a sql query, instead of a DataQuery. Why doesnt notDownloadedIds stay as a sql query, and/or how can I get it to stay like that.
If i dont I get problems down the line when use it in a method.
EDIT after using sanders advice i found out the sql statement generated is
SELECT ccc.[CategoryCountryCategoryTypeMappingID], ccc.[CountryID], ccc.[CategoryID],
ccc.[CategoryTypeID], ccc.[URLSegment], ccc.[DTS], ccc.[DTSUTC]
FROM [Store].[CategoryCountryCategoryTypeMappings] AS ccc
WHERE EXISTS(
SELECT *
FROM [dbo].[DownloadLog] AS [t1]
WHERE ([t1].[CategoryCountryCategoryTypeMappingID]
<> ccc.[CategoryCountryCategoryTypeMappingID])
AND (DATEPART(Hour, [t1].[DTS])) = (DATEPART(Hour, GETDATE()))
)
I need to change WHERE EXISTS .... column <> column, to WHERE NOT EXISTS ..... column =column. Is it possible to do this without resolving it to a dataquery?
There are subtle problems where you lose the direct-to-SQL mapping. Without diving into the details (I seem to recall .Contains() being problematic), I would recommend you try to refactor your query into a different form, for example:
notDownloadedIds = cDataContext.CategoryCountryCategoryTypeMappings.Where(mapping =>
!currentLogs.Select(dll => dll.CategoryCountryCategoryTypeMappingID)
.Any(id => id == mapping.CategoryCountryCategoryTypeMappingID))
If I read your code right, this should result in an equivalent query. Is this also transformed into a DataQuery?
My guess is that since you're incorporating external data (currentLogs) as part of your query that Linq-to-SQL is going to pull all data from CategoryCountryCategoryTypeMappings and then do the filtering in Linq-to-Objects.
By why does it matter? Certainly there could be a performance difference but I owuld expect you to expose the queries as anything other than IEnumerable<T> or IQueryable<T>.
Depending on the amount of objects in your currentLogs collection, you can do the following, which should result in a SQL query.
I think the maximum parameter count (which is what your ids will be translated to) for a query is around 2000.
var ids = currentLogs
.Select(x => x.CategoryCountryCategoryTypeMappingID)
.ToList();
notDownloadedIds =
from x in cDataContext.CategoryCountryCategoryTypeMappings
where !ids.Contains(x.CategoryCountryCategoryTypeMappingID)
select x;

Making a UNION query more efficient in LINQ

I am currently working on a project leveraging EF and I am wondering if there is a more efficient or cleaner way to handle what I have below.
In SQL Server I could get the data I want by doing something like this:
SELECT tbl2.* FROM
dbo.Table1 tbl
INNER JOIN dbo.Table2 tbl2 ON tbl.Column = tbls2.Colunm
WHERE tbl.Column2 IS NULL
UNION
SELECT * FROM
dbo.Table2
WHERE Column2 = value
Very straight forward. However in LINQ I have something that looks like this:
var results1 = Repository.Select<Table>()
.Include(t => t.Table2)
.Where(t => t.Column == null);
var table2Results = results1.Select(t => t.Table2);
var results2 = Repository.Select<Table2>().Where(t => t.Column2 == "VALUE");
table2Results = table2Results.Concat(results2);
return results2.ToList();
First and foremost the return type of the method that contains this code is of type IEnumerable< Table2 > so first I get back all of the Table2 associations where a column in Table1 is null. I then have to select out my Table2 records so that I have a variable that is of type IEnumerable. The rest of the code is fairly straightforward in what it does.
This seems awfully chatty to me and, I think, there is a better way to do what I am trying to achieve. The produced SQL isn't terrible (I've omitted the column list for readability)
SELECT
[UnionAll1].*
FROM (SELECT
[Extent2].*
FROM [dbo].[Table1] AS [Extent1]
INNER JOIN [dbo].[Table2] AS [Extent2] ON [Extent1].[Column] = [Extent2].[Column]
WHERE [Extent1].[Column2] IS NULL
UNION ALL
SELECT
[Extent3].*
FROM [dbo].[Table2] AS [Extent3]
WHERE VALUE = [Extent3].[Column]) AS [UnionAll1]
So is there a cleaner / more efficient way to do what I have described? Thanks!
Well, one problem is that your results may not return the same data as your original SQL query. Union will select distinct values, Union All will select all values. First, I think your code could be made a lot clearer like so:
// Notice the lack of "Include". "Include" only states what should be returned
// *with* the original type, and is not necessary if you only need to select the
// individual property.
var firstResults = Repository.Select<Table>()
.Where(t => t.Column == null)
.Select(t => t.Table2);
var secondResults = Repository.Select<Table2>()
.Where(t => t.Column2 == "Value");
return firstResults.Union(secondResults);
If you know that it's impossible to have duplicates in this query, use Concat instead on the last line (which will produce the UNION ALL that you see in your current code) for reasons described in more detail here. If you want something similar to the original query, continue to use Union like in the example above.
It's important to remember that LINQ-to-Entities is not always going to be able to produce the SQL that you desire, since it has to handle so many cases in a generic fashion. The benefit of using EF is that it makes your code a lot more expressive, clearer, strongly typed, etc. so you should favor readability first. Then, if you actually see a performance problem when profiling, then you might want to consider alternate ways to query for the data. If you profile the two queries first, then you might not even care about the answer to this question.

Sort Linq list with one column

I guess it should be really simple, but i cannot find how to do it.
I have a linq query, that selects one column, of type int, and i need it sorted.
var values = (from p in context.Products
where p.LockedSince == null
select Convert.ToInt32(p.SearchColumn3)).Distinct();
values = values.OrderBy(x => x);
SearchColumn3 is op type string, but i only contains integers. So i thought, converting to Int32 and ordering would definitely give me a nice 1,2,3 sorted list of values. But instead, the list stays ordered like it were strings.
199 20 201
Update:
I've done some tests with C# code and LinqPad.
LinqPad generates the following SQL:
SELECT [t2].[value]
FROM (
SELECT DISTINCT [t1].[value]
FROM (
SELECT CONVERT(Int,[t0].[SearchColumn3]) AS [value], [t0].[LockedSince], [t0].[SearchColumn3]
FROM [Product] AS [t0]
) AS [t1]
WHERE ([t1].[LockedSince] IS NULL)
) AS [t2]
ORDER BY [t2].[value]
And my SQL profiler says that my C# code generates this piece of SQL:
SELECT DISTINCT a.[SearchColumn3] AS COL1
FROM [Product] a
WHERE a.[LockedSince] IS NULL
ORDER BY a.[SearchColumn3]
So it look like C# Linq code just omits the Convert.ToInt32.
Can anyone say something useful about this?
[Disclaimer - I work at Telerik]
You can solve this problem with Telerik OpenAccess ORM too. Here is what i would suggest in this case.
var values = (from p in context.Products
where p.LockedSince == null
orderby "cast({0} as integer)".SQL<int>(p.SearchColumn3)
select "cast({0} as integer)".SQL<int>(p.SearchColumn3)).ToList().Distinct();
OpenAccess provides the SQL extension method, which gives you the ability to add some specific sql code to the generated sql statement.
We have started working on improving this behavior.
Thank you for pointing this out.
Regards
Ralph
Same answer as one my other questions, it turns out that the Linq provider i'm using, the one that comes with Telerik OpenAccess ORM does things different than the standard Linq to SQL provider! See the SQL i've posted in my opening post! I totally wasn't expecting something like this, but i seem that the Telerik OpenAccess thing still needs a lot of improvement. So be careful before you start using it. It looks nice, but it has some serious shortcomings.
I can't replicate this problem. But just make sure you're enumerating the collection when you inspect it. How are you checking the result?
values = values.OrderBy(x => x);
foreach (var v in values)
{
Console.WriteLine(v.ToString());
}
Remember, this won't change the order of the records in the database or anywhere else - only the order that you can retrieve them from the values enumeration.
Because your values variable is a result of a Linq expression, so that it doest not really have values until you calling a method such as ToList, ToArray, etc.
Get back to your example, the variable x in OrderBy method, will be treated as p.SearchColumn3 and therefore, it's a string.
To avoid that, you need to let p.SearchColumn3 become integer before OrderBy method.
You should add a let statement in to your code as below:
var values = (from p in context.Products
where p.LockedSince == null
let val = Convert.ToInt32(p.SearchColumn3)
select val).Distinct();
values = values.OrderBy(x => x);
In addition, you can combine order by statement with the first, it will be fine.

Does Linq retrieve all records first if I do a Count?

Assume I have a table called Population that stores some demographic data. In T-SQL, to get the count of people over 50, I might do something like this:
SELECT COUNT(*) FROM POPULATION
WHERE AGE > 50
I thought the following linq statement would work, but it just returns zero and I don't understand why.
var count = _context.Population.Count(x => x.Age > 50);
In order for me to actually get the count, I have to do either of the following:
var count = _context.Populaton.Where(x => x.Age > 50).Count();
var count = _context.Population.Select(x => x.Age > 50).Count();
Why are the above scenarios the case?
Linq does not retrieve all the records first. It defers the execution of the query until last possible moment. This allows the query to be optimized.
http://blogs.msdn.com/b/charlie/archive/2007/12/09/deferred-execution.aspx
I have found that order is important sometimes. Hope this helps.
Bob
In all cases Count() will NOT do the computation in memory based on the records returned from the database, but it will actually change the generated SQL to include the COUNT statement. A simplistic version of your generated TSQL query would be something like:
SELECT
COUNT(1)
FROM [dbo].[Population] AS [Extent1]
WHERE [Extent1].[Age] > 50
When you call Count() the query is executed immediately. All your queries seem to be correct, so check your database, provider and context to make sure the query is executing properly.
Correct.
like in any SQL statement, it has a certain order you must follow.
and if you do a count on where, you basically don't give him anything to do a count on.

Categories