SQL "IN" statement in linq query mistake, how resolve? - c#

I have this query in SQL:
SELECT *
FROM TableName
WHERE myData IN (SELECT MAX(myData) AS DATA_MAX
FROM TableName
GROUP BY id1, id2)
I want replicate it in Linq (c#) - how can I do that?

This isn't really a direct answer because it doesn't implement it via LINQ; but it does solve the problem, with the minimum amount of fuss:
You can use tools like "Dapper" to execute raw queries without involving any LINQ. If you're using something like LINQ-to-SQL or Entity Framework, the data-context there also usually has a raw query API that you can use, but I'm going to show a "Dapper" implementation:
class SomeType
{
// not shown: properties that look like the columns
// of [TableName] in the database - correct names/types
}
...
var data = connection.Query<SomeType>(#"
SELECT * FROM TableName
WHERE myData IN (Select max(myData) as DATA_MAX from TableName group
by id1, id2)").AsList();
This approach makes it very easy to migrate existing SQL queries without having to rewrite everything as LINQ.
If you are using LINQ-to-SQL, DataContext has a similiar ExecuteQuery<TResult> method. Entity Framework has a SqlQuery method

Long story short - don't use LINQ, optimize the query and use a microORM like Dapper to map results to classes :
var query = "Select * "
"from ( select *, " +
" ROW_NUMBER() OVER (partition by id1,id2 order by mydata desc) AS RN " +
" From TableName ) T " +
"where RN=1";
var data = connection.Query<SomeType>(query);
LINQ isn't a replacement for SQL. ORMs in general aren't meant to write reporting queries like this one.
Reporting queries need a lot of optimization and usually have to change in production. You don't want to have to redeploy your application each time a query changes. In this case it's far better to create a view and map to it using a microOMR like Dapper.
This specific query could require two table scans, one to calculate the maximum per id1,id2 and one to find the rows with matching mydata. The intermediate data would have to be spooled into tempdb too. If mydata is covered by an index, it may not be such an expensive query. If it isn't, all the data will be scanned twice.
An alternative is to calculate the ranking of each row by mydata based on id1, id2. You can do this with one of the ranking functions like ROW_NUMBER, RANK, NTILE.
Select *
from ( select *,
ROW_NUMBER() OVER (partition by id1,id2 order by mydata desc) AS RN
From TableName) T
where RN=1
You can use that query directly with Dapper or create a view and map your entities to the view, not the table itself.
One option would be to crate a MyTableRanked view :
CREATE VIEW MyTableRanked AS
select *,
ROW_NUMBER() OVER (partition by id1,id2 order by mydata desc) AS RN
From TableName
This would allow you to write :
var query="Select * from MyTableRanked where RN=#rank";
var data = connection.Query<SomeType>(query,new {rank=2});
Allowing you to return the top N records per ID1,ID2 combination

You can try this. May be it will work.
var myData = (from c in _context.TableName
group c by new
{
c.id1,
c.id2
} into gcs
select new
{
gcs.Max(p=>p.myData)
}).AsQueryable();
var result = (from t in _context.TableName
where myData.Contains(t.myData)
select t).ToList();

Related

Incorrect syntax whilst using Dapper

Firstly, I've ran the query against a database and the results are fine.
The model has two properties attached to it {Skip, Take} and are populated with values at the time of execution, however the async query is failing due to:
Incorrect syntax near #Take
I've tested a simpler query select * from table where col1 = #Take and seems to be working perfectly, very odd.
Any ideas?
var query = await conn.QueryAsync<ObjectModel>(
#" SELECT TOP #Take * FROM ( SELECT ROW_NUMBER() OVER(ORDER BY ID) AS RoNum, *
FROM table) as p
where #Skip < RoNum ORDER BY p.ID", model);
For SQL Server, the supported syntax for TOP is the following:
SELECT TOP (#Take) * FROM (SELECT ROW_NUMBER() OVER(ORDER BY ID) AS RoNum, * FROM table) as p WHERE #Skip < RoNum ORDER BY p.ID
It is possible the database you are using is hitting the same limitation.

how to page and query large data in c#

I need to query large amount of data from a database via C# and ADO.NET (IDbDataReader and IDbCommand API).
I've created a query like the follwing:
WITH v as
(SELECT myFields, Datefield
ROW_NUMBER() OVER (ORDER BY Datefield ASC) AS CurrentRow
FROM dbTable
WHERE /**/
AND Datefield BETWEEN #pStart AND #pEnd
// ... )
SELECT myFields, Datefield from v where CurrentRow
BETWEEN #pRowStart AND #pRowEnd
From the results I have to use an C# API which will transform and generate new data,
thats why an SqlServer - only solution can't be used.
I want to query against the database with a pagesize of 10000 until there is no more data.
Something like
while (true)
{
// ... execute reader
if (reader.HasRows)
break;
}
will not work cause I have to use the IDbDataReader interface.
What can I do in my situation?
EDIT + Solution
I iterate over each block in a while loop and check HasRows property of the datareader, cause i can use the specialized type.
On SQL Server 2012 you can use the OFFSET and FETCH clauses for simple and efficient paging, as shown in the ORDER BY clause examples:
SELECT DepartmentID, Name, GroupName
FROM HumanResources.Department
ORDER BY DepartmentID ASC
OFFSET #StartingRowNumber - 1 ROWS
FETCH NEXT #RowCountPerPage ROWS ONLY;
On previous versions you can use a CTE and ROW_NUMBER() to calculate a number for each row and limit the results:
WITH OrdersRN AS
(
SELECT ROW_NUMBER() OVER(ORDER BY OrderDate, OrderID) AS RowNum
,OrderID
,OrderDate
,CustomerID
,EmployeeID
FROM dbo.Orders
)
SELECT *
FROM OrdersRN
WHERE RowNum BETWEEN (#PageNum - 1) * #PageSize + 1
AND #PageNum * #PageSize
ORDER BY OrderDate ,OrderID;

Linq-to-entities, get results + row count in one query

I've seen multiple questions about this matter, however they were 2 years (or more) old, so I'd like to know if anything changed about this.
The basic idea is to populate a gridview and create custom paging. So, I need the results and row count as well.
In SQL this would be something like:
SELECT COUNT(id), Id, Name... FROM ... WHERE ...
Getting everything in a nice simple query. However, I'd like to be consistent and use Linq2Entities.
So far I'm using the approach with two queries (against sql server), because it just works. I would like to optimize it though and use a single query instead.
I've tried this:
var query = from o in _db.Products
select o;
var prods = from o in query
select new
{
Count = query.Count(),
Products = query
};
This produces a very nasty and long query with really unnecessary cross joins and other stuff which I don't really need or want.
Is there a way to get the paged results + count of all entities in a one simple query? What is the recommended approach here?
UPDATE:
Just tried FutureQueries and either I'm doing something wrong, or it actually executes two queries. This shows my sql profiler:
-- Query #1
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM [dbo].[Products] AS [Extent1]
WHERE 1 = [Extent1].[CategoryID]
) AS [GroupBy1];
And next row:
-- Query #1
SELECT
[Extent1].[ID] AS [ID],
[Extent1].[Name] AS [Name],
[Extent1].[Price] AS [Price],
[Extent1].[CategoryID] AS [CategoryID]
FROM [dbo].[Products] AS [Extent1]
WHERE 1 = [Extent1].[CategoryID];
The C# code:
internal static List<Product> GetProducts(out int _count)
{
DatabaseEntities _db = new DatabaseEntities();
var query = from o in _db.Products
where o.CategoryID == 1
select o;
var count = query.FutureCount();
_count = count.Value;
return query.Future().ToList();
}
Did I miss something? According to my profiler it does exactly the same except that added row in the query (-- Query #1).
Have a look at Future Queries to do this in EntityFramework.Extended. The second example on that linked page uses FutureCount() to do exactly what you want. Adapted here:
var q = db.Products.Where(p => ...);
var qCount = q.FutureCount();
var qPage = q.Skip((pageNumber-1)*pageSize).Take(pageSize).Future();
int total = qCount.Value; // Both queries are sent to the DB here.
var tasks = qPage.ToList();
this 'EntityFramework.Extended' library is no longer supported use this one instead:
entityframework-plus and go here:
https://entityframework-plus.net/query-future to see how you can get count and records
in the same query.

Duplicate field name problem in a nested multi-mapping Dapper pagination query

I've ran into an issue when trying to do multi-mapping using Dapper, for pagination queries.
Because I am using a nested query in this pagination scenario, there are multiple tables within the nested query that I must join to get my multi-mapped data, but some of these tables will share some fields of the same name which you can see in my example query below (e.g. id, displayname and email):
q = #"select * from (select p.id, p.title, p.etc...,
u1.id, u1.displayname, u1.email,
u2.id, u2.displayname, u2.email,
t.id, t.name,
row_number() over (order by " + sort.ToPostSortSqlClause() + ") as rownum" +
" from posts p" +
" join users u1 on p.owneruserid = u1.id" +
" join users u2 on p.lastediteduserid = u2.id" +
" join topics t on p.topicid = t.id" +
") seq where seq.rownum between #pLower and #pUpper";
In the example above you can see that within the nested query, there are going to be problems with the fields id (appears in the posts table, both users table joins and the topics table join), and also displayname and email (appear in both users table joins).
The only workaround I have thought of so far involves casting each of these 'problem' fields as a different name, but this then involves the very messy process of creating dummy properties in the affected models, so multimapping can map into these, and editing the 'real' properties in my models to also check the dummy property for a value if the real value has not been set.
Also, in the above scenario I would have to create x dummy properties where x is the number of joins I may have on the same table within a query (in this example, 2 joins on the same Users table, therefore requiring 2 uniquely named dummy properties just for Dapper mapping purposes).
This is obviously not ideal and I'm sure would have knock on problems and more untidyness as I created more of these multi-mapping pagination queries.
I'm hoping there is nice, clean solution to this problem?
There are 2 options I can think of:
option 1: join back to your extended properties outside of your nested query:
select s.*, t1.*, t2.* from
(
select s.*, ROW_NUMBER() OVER (order by somecol) AS RowNumber from Something s
) as X
left join Table t1 on Id = x.SomeId
left join Table t2 on Id = x.SomeOtherId
option 2: Extend SqlBuilder to handle column aliasing:
select s.*, /**unalias(Table,t1)**/, /**unalias(Table,t2)**/ from
(
select s.*, /**alias(Table,t1)**/, /**alias(Table,t2)**/ ROW_NUMBER() OVER (order by somecol) AS RowNumber from Something s
left join Table t1 on Id = x.SomeId
left join Table t2 on Id = x.SomeOtherId
) as X
Then define the alias macro to query and cache a list of columns from the db using INFORMATION_SCHEMA.COLUMNS and simply add a 'column as column_t1` string for each column.
Unalias can do the reverse quite simply.

Linq2Sql: Get every N'th row [duplicate]

Anybody know how to write a LINQ to SQL statement to return every nth row from a table? I'm needing to get the title of the item at the top of each page in a paged data grid back for fast user scanning. So if i wanted the first record, then every 3rd one after that, from the following names:
Amy, Eric, Jason, Joe, John, Josh, Maribel, Paul, Steve, Tom
I'd get Amy, Joe, Maribel, and Tom.
I suspect this can be done... LINQ to SQL statements already invoke the ROW_NUMBER() SQL function in conjunction with sorting and paging. I just don't know how to get back every nth item. The SQL Statement would be something like WHERE ROW_NUMBER MOD 3 = 0, but I don't know the LINQ statement to use to get the right SQL.
Sometimes, TSQL is the way to go. I would use ExecuteQuery<T> here:
var data = db.ExecuteQuery<SomeObjectType>(#"
SELECT * FROM
(SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS [__row]
FROM [YourTable]) x WHERE (x.__row % 25) = 1");
You could also swap out the n:
var data = db.ExecuteQuery<SomeObjectType>(#"
DECLARE #n int = 2
SELECT * FROM
(SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS [__row]
FROM [YourTable]) x WHERE (x.__row % #n) = 1", n);
Once upon a time, there was no such thing as Row_Number, and yet such queries were possible. Behold!
var query =
from c in db.Customers
let i = (
from c2 in db.Customers
where c2.ID < c.ID
select c2).Count()
where i%3 == 0
select c;
This generates the following Sql
SELECT [t2].[ID], [t2]. --(more fields)
FROM (
SELECT [t0].[ID], [t0]. --(more fields)
(
SELECT COUNT(*)
FROM [dbo].[Customer] AS [t1]
WHERE [t1].[ID] < [t0].[ID]
) AS [value]
FROM [dbo].[Customer] AS [t0]
) AS [t2]
WHERE ([t2].[value] % #p0) = #p1
Here's an option that works, but it might be worth checking that it doesn't have any performance issues in practice:
var nth = 3;
var ids = Table
.Select(x => x.Id)
.ToArray()
.Where((x, n) => n % nth == 0)
.ToArray();
var nthRecords = Table
.Where(x => ids.Contains(x.Id));
Just googling around a bit I haven't found (or experienced) an option for Linq to SQL to directly support this.
The only option I can offer is that you write a stored procedure with the appropriate SQL query written out and then calling the sproc via Linq to SQL. Not the best solution, especially if you have any kind of complex filtering going on.
There really doesn't seem to be an easy way to do this:
How do I add ROW_NUMBER to a LINQ query or Entity?
How to find the ROW_NUMBER() of a row with Linq to SQL
But there's always:
peopleToFilter.AsEnumerable().Where((x,i) => i % AmountToSkipBy == 0)
NOTE: This still doesn't execute on the database side of things!
This will do the trick, but it isn't the most efficient query in the world:
var count = query.Count();
var pageSize = 10;
var pageTops = query.Take(1);
for(int i = pageSize; i < count; i += pageSize)
{
pageTops = pageTops.Concat(query.Skip(i - (i % pageSize)).Take(1));
}
return pageTops;
It dynamically constructs a query to pull the (nth, 2*nth, 3*nth, etc) value from the given query. If you use this technique, you'll probably want to create a limit of maybe ten or twenty names, similar to how Google results page (1-10, and Next), in order to avoid getting an expression so large the database refuses to attempt to parse it.
If you need better performance, you'll probably have to use a stored procedure or a view to represent your query, and include the row number as part of the stored proc results or the view's fields.

Categories