how to page and query large data in c# - c#

I need to query large amount of data from a database via C# and ADO.NET (IDbDataReader and IDbCommand API).
I've created a query like the follwing:
WITH v as
(SELECT myFields, Datefield
ROW_NUMBER() OVER (ORDER BY Datefield ASC) AS CurrentRow
FROM dbTable
WHERE /**/
AND Datefield BETWEEN #pStart AND #pEnd
// ... )
SELECT myFields, Datefield from v where CurrentRow
BETWEEN #pRowStart AND #pRowEnd
From the results I have to use an C# API which will transform and generate new data,
thats why an SqlServer - only solution can't be used.
I want to query against the database with a pagesize of 10000 until there is no more data.
Something like
while (true)
{
// ... execute reader
if (reader.HasRows)
break;
}
will not work cause I have to use the IDbDataReader interface.
What can I do in my situation?
EDIT + Solution
I iterate over each block in a while loop and check HasRows property of the datareader, cause i can use the specialized type.

On SQL Server 2012 you can use the OFFSET and FETCH clauses for simple and efficient paging, as shown in the ORDER BY clause examples:
SELECT DepartmentID, Name, GroupName
FROM HumanResources.Department
ORDER BY DepartmentID ASC
OFFSET #StartingRowNumber - 1 ROWS
FETCH NEXT #RowCountPerPage ROWS ONLY;
On previous versions you can use a CTE and ROW_NUMBER() to calculate a number for each row and limit the results:
WITH OrdersRN AS
(
SELECT ROW_NUMBER() OVER(ORDER BY OrderDate, OrderID) AS RowNum
,OrderID
,OrderDate
,CustomerID
,EmployeeID
FROM dbo.Orders
)
SELECT *
FROM OrdersRN
WHERE RowNum BETWEEN (#PageNum - 1) * #PageSize + 1
AND #PageNum * #PageSize
ORDER BY OrderDate ,OrderID;

Related

How to create a ROWNUMBER column that always keeps a column of sequential numbers from 1 to N. The count of the column

In SQL Server, I am trying to create a column of sequential numbers to help me with my code. I am not sure how to create a column that is always populated with a sequential set of numbers starting from 1.
1
2
3
N
SqlCommand command = new SqlCommand("SELECT *, ROW_NUMBER() OVER(ORDER BY Id) AS RowRankNumber FROM Statements WHERE RowRankNumber >= "+1+" AND RowRankNumber <= "+4+"", con);
You need to use inner query to get the result. In your query the you can't use row_number in where clause. For example
SELECT * FROM (
SELECT ROW_NUMBER() OVER(ORDER BY Id) AS RowRankNumber,* FROM Statements
) x WHERE RowRankNumber >=1 AND RowRankNumber <=4

Sql Delete Query Optimizations for Azure

I need some help with optimizing following query.
My Problem
I am trying to clean-up a table based on a size parameter. (delete x Mb from this table). The way I thought implementing it is: iterate over the table, starting with the oldest entry, get each row size (I'm taking only blob columns into account), iterate over all linked tables and do the same operation for them; if currentSize >= size stop the query and return the list of GUIDs found
Please note that this is part of a bigger query, so in the end I need the list of Ids.
What I've tried
First, I tried writing it using EntityFramework, but its execution took too long and I was only halfway of finishing it. So I wrote it directly in T-SQL.
Below is what I managed to write. However, when running into over a SQL Azure Database, it throws a Timeout Exception. I know this is due to the DTU limitation, but I'm also wondering if this query can be improved. I am no SQL expert and I need your help.
Current Query
DECLARE #maxSize int = 1
DECLARE #tempTable TABLE
(
Id uniqueidentifier,
Size float,
Position int
)
DECLARE #currentId uniqueidentifier
DECLARE #maxIterations int
DECLARE #index int = 1
SET #maxIterations = (SELECT COUNT(Id) FROM WhereToDelete)
WHILE(#index < #maxIterations)
BEGIN
INSERT INTO #tempTable
SELECT MasterJobGUID, ISNULL(DATALENGTH(BlobColumn1),0) +
ISNULL(DATALENGTH(BlobColumn2),0) +
ISNULL(DATALENGTH(BlobColumn3),0) +
ISNULL(DATALENGTH(BlobColumn4),0),
#index
FROM WhereToDelete
ORDER BY SomeColumn
OFFSET #index ROWS
FETCH NEXT 1 ROWS ONLY
SET #index=#index+1
SET #currentid = (SELECT TOP 1 Id FROM #tempTable ORDER BY Position DESC)
UPDATE #tempTable
SET Size = Size + ( SELECT SUM(ISNULL(DATALENGTH(BlobColumn),0))
FROM LinkedTable
WHERE ParentId = #currentId )
UPDATE #tempTable
SET Size = Size + ( SELECT ISNULL(SUM(ISNULL(DATALENGTH(OtherBlobColumn),0)),0)
FROM OtherLinkedTable
WHERE OtherLinkedTableId IN
(
SELECT OtherLinkedTableId FROM SomeTable
WHERE SomeTableId IN
(
SELECT SomeTableId FROM SomeOtherTable
WHERE ParentId = #currentId
)
))
IF ((SELECT SUM(Size) FROM #tempTable) >= #maxSize*1000000)
BEGIN
BREAK;
END
END
SELECT Id from #tempTable
You could try something like this
SELECT MasterJobGUID FROM (
SELECT [MasterJobGUID], SUM(ISNULL(DATALENGTH(BlobColumn1),0) +
ISNULL(DATALENGTH(BlobColumn2),0) +
ISNULL(DATALENGTH(BlobColumn3),0) +
ISNULL(DATALENGTH(BlobColumn4),0))
OVER (ORDER BY SomeColumn ROWS UNBOUNDED PRECEDING) SizeTotal
FROM WhereToDelete) innerQuery
WHERE [SizeTotal] < #maxSize*1000000
That's using T-SQL windowing functions to return you the summed total size, and only returning the rows that match the criteria in a single operation. It should be a bunch more efficient than a cursor.

SQL "IN" statement in linq query mistake, how resolve?

I have this query in SQL:
SELECT *
FROM TableName
WHERE myData IN (SELECT MAX(myData) AS DATA_MAX
FROM TableName
GROUP BY id1, id2)
I want replicate it in Linq (c#) - how can I do that?
This isn't really a direct answer because it doesn't implement it via LINQ; but it does solve the problem, with the minimum amount of fuss:
You can use tools like "Dapper" to execute raw queries without involving any LINQ. If you're using something like LINQ-to-SQL or Entity Framework, the data-context there also usually has a raw query API that you can use, but I'm going to show a "Dapper" implementation:
class SomeType
{
// not shown: properties that look like the columns
// of [TableName] in the database - correct names/types
}
...
var data = connection.Query<SomeType>(#"
SELECT * FROM TableName
WHERE myData IN (Select max(myData) as DATA_MAX from TableName group
by id1, id2)").AsList();
This approach makes it very easy to migrate existing SQL queries without having to rewrite everything as LINQ.
If you are using LINQ-to-SQL, DataContext has a similiar ExecuteQuery<TResult> method. Entity Framework has a SqlQuery method
Long story short - don't use LINQ, optimize the query and use a microORM like Dapper to map results to classes :
var query = "Select * "
"from ( select *, " +
" ROW_NUMBER() OVER (partition by id1,id2 order by mydata desc) AS RN " +
" From TableName ) T " +
"where RN=1";
var data = connection.Query<SomeType>(query);
LINQ isn't a replacement for SQL. ORMs in general aren't meant to write reporting queries like this one.
Reporting queries need a lot of optimization and usually have to change in production. You don't want to have to redeploy your application each time a query changes. In this case it's far better to create a view and map to it using a microOMR like Dapper.
This specific query could require two table scans, one to calculate the maximum per id1,id2 and one to find the rows with matching mydata. The intermediate data would have to be spooled into tempdb too. If mydata is covered by an index, it may not be such an expensive query. If it isn't, all the data will be scanned twice.
An alternative is to calculate the ranking of each row by mydata based on id1, id2. You can do this with one of the ranking functions like ROW_NUMBER, RANK, NTILE.
Select *
from ( select *,
ROW_NUMBER() OVER (partition by id1,id2 order by mydata desc) AS RN
From TableName) T
where RN=1
You can use that query directly with Dapper or create a view and map your entities to the view, not the table itself.
One option would be to crate a MyTableRanked view :
CREATE VIEW MyTableRanked AS
select *,
ROW_NUMBER() OVER (partition by id1,id2 order by mydata desc) AS RN
From TableName
This would allow you to write :
var query="Select * from MyTableRanked where RN=#rank";
var data = connection.Query<SomeType>(query,new {rank=2});
Allowing you to return the top N records per ID1,ID2 combination
You can try this. May be it will work.
var myData = (from c in _context.TableName
group c by new
{
c.id1,
c.id2
} into gcs
select new
{
gcs.Max(p=>p.myData)
}).AsQueryable();
var result = (from t in _context.TableName
where myData.Contains(t.myData)
select t).ToList();

Batching data from a temp table

I create a temp table, insert data (and output) and select. I need to batch them and read.
Example: If I have 110 records,and I want to read a batch of 50 records each time, then I will have read 1-50 the first time, 51-100 the second time and 100-110 the third time.
The queries are as follows
create table #mytable
{
customerID int not null
}
insert into #mytable(id)
Output inserted.CustomerID
select top 50 customerID from customer where customerID not in (select c.customerID from #mytable c)
In C#, I my code is like this
do {
getCustomerInformation() //this method does the query processing
} while(customerInfo.Any());
this query works when I run it on SQL server but not in C#. Keeps returning the first 50 rows all the time.
simple use this query ,
1) First time #PageNumber parameter will be 1 ,
2nd time will be 2 and third time send 3
DECLARE #PageNumber AS INT, #RowspPage AS INT
SET #PageNumber = 1 -- 2 , 3
SET #RowspPage = 50
SELECT customerID FROM (
SELECT ROW_NUMBER() OVER(ORDER BY id) AS Numero,
customerID FROM customer
) AS TBL
WHERE Numero BETWEEN ((#PageNumber - 1) * #RowspPage + 1) AND (#PageNumber * #RowspPage)
ORDER BY customerID
In SSMS, you have your temp table, and you check the IDs for your batch by checking the temp table.
Temp tables normally only live as long as the session. So if in C# you initialize a new session each time you call the method, your session will not see the temp table from the previous session.
You can get around this using ##two hashes, or just create an actual staging table.
There are ways ..
From SQL Server 2012, we can use OFFSET and FETCH NEXT Clause to achieve the pagination.
Try this:
2 For all version of sql post 2005 we can achieve the same with CTE and passing Param more detals
You can achieve paging in C# using the Take() and Skip() methods provided by LINQ:
const int batchSize = 50;
do {
var currentBatch = customerInfo.Take(batchSize);
customerInfo = customerInfo.Skip(batchSize);
} while(customerInfo.Any());

Linq to dataset and display linq result in repeater control

I need to display distinct rows, which actually should get 1 latest rows for each year based for date fields.
I tried to get resolve this using CTE but on further testing it is not working properly as it will gets only the ROW based on ROWNumber and if we use filter then it doesn't get the desired results.
So i thought of getting the AlbumID, AlbumName, AlbumDate and AlbumYear_YYYY from table and pass it to dataset and then use LINQ againt this dataset to further get the unique rows based on latest album for the year YEAR only
Assuming my Table has following rows
AlbumID, AlbumName, AlbumDate , AlbumIcon
MY MS-SQL query
string sql= "SELECT AlbumID, AlbumName, AlbumIcon, AlbumDate, DATEPART(YYYY, AlbumDate) AS Year FROM PhotoAlbumName "
DataSet ds = DataProvider.Connect_Select(strSql);
DataView dv = ds.Tables[0].DefaultView;
//DO LINQ HERE and pass the value of linq to Pager control
PagerControl1.BindDataWithPaging(rptAlbumsCategories, dv.Table);
I am not sure how to do this but if it works then it will eliminate the unwanted result which happens due to following sql query.
;WITH DistinctYEAR AS
(
SELECT AlbumID, AlbumIcon, AlbumDate, AlbumVisible,AlbumName,
ROW_NUMBER() OVER(PARTITION BY DATEPART(YYYY,AlbumDate) ORDER BY AlbumDate) AS 'RowNum'
FROM PhotoAlbumName
)
SELECT * FROM DistinctYEAR WHERE RowNum = 1 AND AlbumVisible = 1 ORDER BY AlbumDate DESC
UPDATE:
I'm not sure but i assume that your cte is incorrect. You should apply the WHERE AlbumVisible = 1 on the CTE not on the outer SELECT:
;WITH DistinctYEAR AS
(
SELECT AlbumID, AlbumIcon, AlbumDate, AlbumVisible,AlbumName,
ROW_NUMBER() OVER(PARTITION BY DATEPART(YYYY,AlbumDate) ORDER BY AlbumDate) AS 'RowNum'
FROM PhotoAlbumName
WHERE AlbumVisible = 1
)
SELECT dy.*
FROM DistinctYEAR dy
WHERE RowNum = 1
ORDER BY AlbumDate DESC

Categories