I have code in my C# console app that is querying a LARGE dataset in SQL, and adding it to an IEnumerable collection that I use to iterate through later in the app. On a SQL table that returns less than 100K rows, it works great, but I have to use this to iterate through 100 Million records, After the SQL query runs, and Dapper tries to fill the collection, I end up with an OUT OF MEMORY exception error. I'm pretty certain it's because it's trying to write 100 Million objects at a time. Is there a way I can batch a collection with no more than say 500K objects, do what I need to do then come back and process another 500K and so on? I essentially need to READ from SQL 500K records, then write those to a file, Read another 500K , write to another file.
public List<AxDlsd> GetDistinctDlsdObjects(AxApp axApp, OperationType operationType)
{
if (operationType == OperationType.Assessment)
{
string query = $"SELECT DISTINCT(clipid) from {axApp.dlname}";
using (var connection = _dbConnectionFactory.GetAxDbConnection())
{
//SqlMapper.Settings.CommandTimeout = 0;
List<AxDlsd> dlsdrecord = new List<AxDlsd>();
return connection.Query<AxDlsd>(query, commandTimeout: 0, buffered: false ).ToList();
}
}
You can do a SELECT COUNT(DISTINCT clipid) from {axApp.dlname} to get the total and then use that to page
int pageSize = 500000;
for(var page = 0; page < (total / pageSize) + 1; page++)
{
string query = $"SELECT DISTINCT(clipid) from {axApp.dlname} ORDER BY clipid OFFSET {page * pageSize} FETCH NEXT {pageSize} ROWS ONLY";
///...
}
This will allow you to go through 500k rows at a time or whatever you page size is. FETCH/OFFSET does require SQL Server 2012. I'm not sure what SQL you are using.
Related
DECLARE #PageNumber AS INT
DECLARE #RowsOfPage AS INT
DECLARE #MaxTablePage AS FLOAT
SET #PageNumber = 1
SET #RowsOfPage = 4
SELECT #MaxTablePage = COUNT(*) FROM SampleFruits
SET #MaxTablePage = CEILING(#MaxTablePage/#RowsOfPage)
WHILE #MaxTablePage >= #PageNumber
BEGIN
SELECT FruitName, Price
FROM SampleFruits
ORDER BY Price
OFFSET (#PageNumber-1) * #RowsOfPage ROWS
FETCH NEXT #RowsOfPage ROWS ONLY
SET #PageNumber = #PageNumber + 1
END
I have created 2 SQL Server paginated queries following the above sample script found at this link: https://www.sqlshack.com/pagination-in-sql-server/#:~:text=What%20is%20Pagination%20in%20SQL,pagination%20solution%20for%20SQL%20Server.
I want to load their results in .NET lists, something like this:
List<Item> currentItemVersion = GetCurrentItemVersion();
List<Item> itemVersionHistory = GetItemVersionHistory();
foreach (Item myItem in currentItemVersion)
{
if (myItem.IsGood == true)
{
List<Items> goodItems = itemVersionHistory.Where(x => x.Item_ID == myItem.Item_ID).ToList();
foreach (Item ItemVersions in goodItems)
{
// DO SOME THINGS HERE
}
}
}
In this C# code, lists currentItemVersion and itemVersionHistory have only the first 4 items returned by the first page of the underlying T-SQL paging query, so I can only process 4 items in my T-SQL's results first page.
How do I process all the items in the several pages returned by my underlying SQL Server paged queries?
Or is this actually the correct way of doing what I am trying to do?
I have a database table having 100 million records. Screen Shot is taken from Robomongo
Table Schema: There are 100 million records
When I run the following code. I get results, but It takes around 1 minute to get completed. I need to optimize the query to get results faster. What I have done till now is here. Please tell me the way forward to achieve the optimized result.
var collection = _database.GetCollection<BsonDocument>("FloatTable1");
var sw = Stopwatch.StartNew();
var builder = Builders<BsonDocument>.Filter;
int min = Convert.ToInt32(textBox13.Text); //3
int max = Convert.ToInt32(textBox14.Text); //150
var filt = builder.Gt("Value", min) & builder.Lt("Value", max);
var list = collection.Find(filt);
sw.Stop();
TimeSpan time = sw.Elapsed;
Console.WriteLine("Time to Fetch Record: " + time.ToString());
var sw1 = Stopwatch.StartNew();
var list1 = list.ToList();
sw1.Stop();
TimeSpan time1 = sw1.Elapsed;
Console.WriteLine("Time to Convert var to List: " + time1.ToString());
Console.WriteLine("Total Count in List: " + list1.Count.ToString());
Out Put is:
Time to Fetch Record: 00:00:00.0059207
Time to Convert var to List: 00:01:00.7209163
Total Count in List: 1003154
I have few question related to the given code.
When line collection.Find(filt) executes, does it fetch filtered record from the database OR Just creating filter?
var list1 = list.ToList(); takes 1 minute to execute, is it only converting from var to list OR First fetching data than converting?
How to achieve this query and result in least possible time. Please Help.
When line collection.Find(filt) executes, does it fetch filtered
record from the database OR Just creating filter?
It is just creating the filter.
var list1 = list.ToList(); takes 1 minute to execute, is it only
converting from var to list OR First fetching data than converting?
It is fetching the data and converting.
How to achieve this query and result in least possible time. Please Help.
The fetch / filtering on the database is eating your time. The easiest way to speed it up would be creating an index on the column you are filtering.
Everything else would need some more effort or database technologies, like creating a column which more roughly presents your date (e.g. grouped by day) and indexing this one, or creating something like table sections grouped by a given timespan (I'm not a DB-Admin and don't know the proper terms for this, but I remember somebody doing it on a database with billions of records ;) )
I have around 25k records in datatable. I already have update query written by previous developer which I can't change. What I am trying to do is as follows:
Take 1000 records at a time from datatable, records can vary from 1 to 25k.
Update query which is in string, replace IN('values here') clause of that with these 1000 records and fire query against database.
Now, I know there are effecient ways to do it, like bulk insert by use of array binding , but I can't change present coding pattern due to restrictions.
What I have tried to do:
if (dt.Rows.Count>0)
{
foreach (DataRow dr in dt.Rows)
{
reviewitemsend =reviewitemsend + dr["ItemID"].ToString()+ ',';
//If record count is 1000 , execute against database.
}
}
Now above approach is taking me nowwhere and am like struck. So another better aproach which I am thinking is below :
int TotalRecords = dt.rows.count;
If (TotalRecords <1000 && TotalRecords >0 )
//Update existing query with this records by placing them in IN cluse and execute
else
{
intLoopCounter = TotalRecords/1000; //Manage for extra records, as counter will be whole number, so i will check modulus division also, if that is 0, means no need for extra counter, if that is non zero, intLoopCounter increment by 1
for(int i= 0;i < intLoopCounter; i++)
{
//Take thousand records at a time, unless last counter has less than 1000 records and execute against database
}
}
Also, note update query is below :
string UpdateStatement = #" UPDATE Table
SET column1=<STATUS>,
column2= '<NOTES>',
changed_by = '<CHANGEDBY>',
status= NULL,
WHERE ID IN (<IDS>)";
In above update query, IDS are already replaced with all 25K record ID's, which will be shown to end user like that, internally only I have to execute it as separate chunks, So within IN() cluase I need to insert 1k records at a time
You can split your Datatable using this linq method:
private static List<List<DataRow>> SplitDataTable(DataTable table, int pageSize)
{
return
table.AsEnumerable()
.Select((row, index) => new { Row = row, Index = index, })
.GroupBy(x => x.Index / pageSize)
.Select(x => x.Select(v => v.Row).ToList())
.ToList();
}
Then run the database query on each chunk:
foreach(List<DataRow> chuck in SplitDataTable(dt, 1000))
{
foreach(DataRow row in chuck)
{
// prepare data from row
}
// execute against database
}
Tip: you can modify the split query to prepare your data directly inside of it (by replacing the x.Select(v => v.Row) part, instead of looping twice on that huge DataTable.
I have to update every row in a Sql Server table with about 150,000 records using entity framework. To reduce the amount of hits the server takes, I would like to do this in separate batches of 1000 rows. I need entity framework to:
Select the first 1000 rows from the DB.
Update those rows.
Call SaveChanges() method.
Get next 1000 rows.
Repeat.
Whats the best way to achieve this?
I'm using entity framework 4 and SQL Server 2012.
Use LINQ Skip & Take:
return query.Skip(HOW MUCH TO SKIP -AT THE BEGINNING WILL BE ZERO-)
.Take(HOW MUCH TO TAKE -THE NUMBER OF YOUR PAGING SIZE-).ToList();
If you want to do it within a loop you can do something like this:
int pagingIncrement = 1000;
for (int i = 0; i <= 150 000; i=i+pagingIncrement)
{
var query = your actual LINQ query.
var results = query.Skip(i).Take(pagingIncrement);
UpdatePartialResults(results);
}
Note: It is important that while updating those rows you don't update the criteria for the ORDER BY within your actual LINQ query, otherwise you could be end up updating the same results again and again (because of the reordering).
Other idea will be to extend the IEnumerable iterator with some of the previously given ideas such as a Skip(counter).Take(pagingSize and yield result (to be processing kinda asynchronously).
something like this should work:
int skip =0;
int take = 1000;
for (int i = 0; i < 150; i++)
{
var rows = (from x in Context.Table
select x).OrderBy(x => x.id).Skip(skip).Take(take).ToList();
//do some update stuff with rows
skip += 1000;
}
I was asked to do a report that combines 3 different crystal reports that we use. Already those reports are very slow and heavy and making 1 big one was out of the question. SO I created a little apps in VS 2010.
My main problem is this, I have 3 Datatable (same schema) that were created with the Dataset designer that I need to combine. I created an empty table to store the combined value. The queries are already pretty big so combining them in a SQL query is really out of the question.
Also I do not have write access to the SQL server (2005), because the server is maintained by the company that created our MRP program. Although I could always ask support to add a view to the server.
So my 3 datatable consist of Labor Cost, Material Cost and subcontracting Cost. I need to create a total cost table that adds all of the Cost column of each table by ID. All the table have keys to find and select them.
The problem is that when i fetch all of the current job it is ok (500ms for 400 records), because I have a query that will fetch only the working job. Problem is with Inventory, since I do not know since when those Job were finished I have to fetch the entire database (around 10000 jobs with subqueries that each have up to 100 records) and this for my 3 tables. This takes around 5000 to 8000ms, although it is very fast compared to the crystal report there is one problem.
I need to create a summary table that will combine all these different tables I created, But I also need to do them 2 times, 1 time for each date that is outputted. So my data always changes, because they are based on a Date parameter. Right now it will take around 12-20sec to fetch them all.
I need a way to reduce the load time, here is what I tried.
Tried a for loop to combine the 3 tables
Then tried with the DataReader class to read each line and used the FindByKey methods that the dataset designer created to find the value in the other table, and I have to do this 2 time. (it seems to go a little bit faster than the for loop)
Tried with Linq, don't think it is possible, and will it give more performance?
Tried to do a dynamic query that use "WHERE IN Comma Separated List" (that actually doubled the time of execution, compared to fetching all of the database)
Tried to join my Inventory query to the my Cost queries (that also increased the time it took)
1 - So is there any way to combine my tables more effectively? What is the fastest way to Merge and Sum my records of my 3 tables?
2 - Is there any way to increase performance of my queries without having write access to the server?
Below is some of the code I used for reference :
public static void Fill()
{
DateTime Date = Data.Date;
AllieesDBTableAdapters.CoutMatTableAdapter mat = new AllieesDBTableAdapters.CoutMatTableAdapter();
AllieesDBTableAdapters.CoutLaborTableAdapter lab = new AllieesDBTableAdapters.CoutLaborTableAdapter();
AllieesDBTableAdapters.CoutSTTableAdapter st = new AllieesDBTableAdapters.CoutSTTableAdapter();
Data.allieesDB.CoutTOT.Clear();
//Around 2 sec each Fill
mat.FillUni(Data.allieesDB.CoutMat, Date);
Data.allieesDB.CoutMat.CopyToDataTable(Data.allieesDB.CoutTOT, LoadOption.OverwriteChanges);
lab.FillUni(Data.allieesDB.CoutLabor, Date);
MergeTable(Data.allieesDB.CoutLabor);
st.FillUni(Data.allieesDB.CoutST, Date);
MergeTable(Data.allieesDB.CoutST);
}
Here is the MergeTable Methods (The For loop I tried is in Comment)
private static void MergeTable(DataTable Table)
{
AllieesDB.CoutTOTDataTable dtTOT = Data.allieesDB.CoutTOT;
DataTableReader r = new DataTableReader(Table);
while (r.Read())
{
DataRow drToT = dtTOT.FindByWO(r.GetValue(2).ToString());
if (drToT != null)
{
drToT["Cout"] = (decimal)drToT["Cout"] + (decimal)r.GetValue(3);
} else
{
EA_CoutsDesVentes.AllieesDB.CoutTOTRow row = dtTOT.NewCoutTOTRow();
for (int j = 0; j < r.FieldCount; j++)
{
if (r.GetValue(j) != null)
{
row[j] = r.GetValue(j);
} else
{
row[j] = null;
}
}
dtTOT.AddCoutTOTRow(row);
}
Application.DoEvents();
}
//try
//{
// for (int i = 0; i < Table.Rows.Count; i++)
// {
// DataRow drSource = Table.Rows[i];
// DataRow drToT = dtTOT.FindByWO(drSource["WO"].ToString());
//if (drToT != null)
//{
// drToT["Cout"] = (decimal)drToT["Cout"] + (decimal)drSource["Cout"];
//} else
//{
//
// EA_CoutsDesVentes.AllieesDB.CoutTOTRow row = dtTOT.NewCoutTOTRow();
// for (int j = 0; j < drSource.Table.Columns.Count; j++)
// {
// if (drSource[j] != null)
// {
// row[j] = drSource[j];
// } else
// {
// row[j] = null;
// }
// }
// dtTOT.AddCoutTOTRow(row);
//}
//Application.DoEvents();
// }
//} catch (Exception)
//{
//}
On Sql Server 2005 and up, you can create a materialized view of the aggregate values and dramatically speed up the performance.
look at Improving Performance with SQL Server 2005 Indexed Views