Iterating over Linq-to-Entities IEnumerable causes OutOfMemoryException - c#

The part of the code I'm working on receives an
IEnumerable<T> items
where each item contains a class with properties reflecting a MSSQL database table.
The database table has a total count of 953664 rows.
The dataset in code is filtered down to a set of 284360 rows.
The following code throws an OutOfMemoryException when the process reaches about 1,5 GB memory allocation.
private static void Save<T>(IEnumerable<T> items, IList<IDataWriter> dataWriters, IEnumerable<PropertyColumn> columns) where T : MyTableClass
{
foreach (var item in items)
{
}
}
The variable items is of type
IQueryable<MyTableClass>
I can't find anyone with the same setup, and other's solutions that I've found doesn't apply here.
I've also tried paging, using Skip and Take with a page size of 500, but that just takes a long time and ends up with the same result. It seems like objects aren't being released after each iteration. How is that?
How can I rewrite this code to cope with a larger collection set?

Well, as Servy has already said you didn't provide your code so I'll try to make some predictions... (Sorry for my english)
If you have an exception in "foreach (var item in items)" when you are using paging then, I guess, something wrong with paging. I wrote a couple of examples to explain my idea.
if first example I suggest you (just for test) put your filter inside the Save function.
private static void Save<T>(IQueryable<T> items, IList<IDataWriter> dataWriters, IEnumerable<PropertyColumn> columns) where T : MyTableClass
{
int pageSize = 500; //Only 500 records will be loaded.
int currentStep = 0;
while (true)
{
//Here we create a new request into the database using our filter.
var tempList = items.Where(yourFilter).Skip(currentStep * pageSize).Take(pageSize);
foreach (var item in tempList)
{
//If you have an exception here maybe something wrong in your dataWriters or columns.
}
currentStep++;
if (tempList.Count() == 0) //No records have been loaded so we can leave.
break;
}
}
The second example show how to use paging without any changes in the Save function
int pageSize = 500;
int currentStep = 0;
while (true)
{
//Here we create a new request into the database using our filter.
var tempList = items.Where(yourFilter).Skip(currentStep * pageSize).Take(pageSize);
Save(tempList, dataWriters, columns); //Calling saving function.
currentStep++;
if (tempList.Count() == 0)
break;
}
Try both of them and you'll either resolve your problem or find another place where an exception is raised.
By the way, another potential place is your dataWriters. I guess there you store all data that your have been received from the database. Maybe you shouldn't save all data? Just calculate memory size that all objects are required.
P.S. And don't use while(true) in your code. It just an example:)

Related

Can Sitecore (ContentSearch) SeachMaxResults for Solr be redefined at runtime?

Currently I am using the default configuration value of:
<setting name="ContentSearch.SearchMaxResults" value="500" />
I need, for a specific Solr (ContentSearch) query, to return all items of a specific Template ID. The total returned will be in excess of 1200 items.
I tried using the paging feature to override SearchMaxResults by invoking a query as follows:
var query = context.GetQueryable<SearchResultItem>().Filter(i => i["_template"].Equals(variantTemplateId));
query = query.Page(1, 1500);
var results = query.GetResults();
However, I still only receive a single page of 500 items as the 1500 Page Size won't override the SearchMaxResults value of 500.
I really don't want to increase SearchMaxResults for all queries as it's going to have a negative impact overall on search. It would be ideal if I could set this parameter to "" (unlimited results) temporarily, run my query, and reset it back to default - but I don't see a way to be able to do this. I also cannot use GetDescendants() as a mean of acquiring all these items as it negatively impacts site performance, even if I only do it one time and store my results in Memory Cache.
Any direction would be greatly appreciated.
As you say, it's good to keep the SearchMaxResults to a reosanble low number, such as 500. When you know you might need to fetch more data, you can perform several queries in a loop, for example like this:
int skip = 0;
const int chunkSize = 500;
bool fetchMore = true;
while (fetchMore) {
var q = context.GetQueryable<MyModel>()
.Filter(....)
...
.Skip(skip).Take(chunkSize)
.Select (d => new { d.field1, d.field2, ... })
.GetResults();
var cnt = 0;
foreach (var doc in q.Hits) {
// do stuff
cnt ++;
}
skip += cnt;
fetchMore = cnt == chunkSize;
}
As slightly noted above, I've used the Select method to limit the number of fields returned. This will specify the fl Solr field to return just the fields you need. Otherwise fl=*,score will be used and cause a lot of data to be sent over the network and deserializing it can be quite heavy. (I have a separate post on this here: https://mikael.com/2019/01/optimize-sitecore-solr-queries/)

Finalise SQLite 3 statement

I'm developing metro app using Windows 8 release preview and C#(VS 2012), I'm new to SQLite, I integrated SQLite 3.7.13 in my App and it is working fine, Observe my code below
var dbPath = Path.Combine(Windows.Storage.ApplicationData.Current.LocalFolder.Path, "Test.db");
using (var db = new SQLite.SQLiteConnection(dbPath))
{
var data = db.Table<tablename>().Where(tablename => tablename.uploaded_bool == false && tablename.Sid == 26);
try
{
int iDataCount = data.Count();
int id;
if (iDataCount > 0)
{
for (int i = 0; i < iDataCount; i++)
{
Elements = data.ElementAt(i);
id = Elements.id;
/*
Doing some code
*/
}
int i = db.Delete<tablename>(new tablename() { Sid = 26 });
}
}
catch (Exception ex)
{
}
}
where "Sid" is column in my database and with number "26" i will get n number of rows
So, using a for loop i need to do some code and after the for loop I need to delete records of Sid(26) in database, So at this line
int i = db.Delete<tablename>(new tablename() { Sid = 26 });
I'm getting unable to close due to unfinalised statements exception, So my question is how to finalise the statement in sqlite3,Apparently SQLite3 has a finalize method for destroying previous DB calls but I am not sure how to implement this. Please help me.
Under the covers sqlite-net does some amazing things in an attempt to manage queries and connections for you.
For example, the line
var data = db.Table<tablename>().Where(...)
Does not actually establish a connection or execute anything against the database. Instead, it creates an instance of a class called TableQuery which is enumerable.
When you call
int iDataCount = data.Count();
TableQuery actually executes
GenerateCommand("count(*)").ExecuteScalar<int>();
When you call
Elements = data.ElementAt(i);
TableQuery actually calls
return Skip(index).Take(1).First();
Take(1).First() eventually calls GetEnumerator, which compiles a SQLite command, executes it with TOP 1, and serializes the result back into your data class.
So, basically, every time you call data.ElementAt you are executing another query. This is different from standard .NET enumerations where you are just accessing an element in a collection or array.
I believe this is the root of your problem. I would recommend that instead of getting the count and using a for(i, ...) loop, you simply do foreach (tablename tn in data). This will cause all records to be fetched at once instead of record by record in the loop. That alone may be enough to close the query and allow you to delete the table during the loop. If not, I recommend you create a collection and add each SID to the collection during the loop. Then, after the loop go back and remove the sids in another pass. Worst case scenario, you could close the connection between loops.
Hope that helps.

Cache only parts of an object

I'm trying to achieve a super-fast search, and decided to rely heavily on caching to achieve this. The order of events is as follows;
1) Cache what can be cached (from entire database, around 3000 items)
2) When a search is performed, pull the entire result set out of the cache
3) Filter that result set based on the search criteria. Give each search result a "relevance" score.
4) Send the filtered results down to the database via xml to get the bits that can't be cached (e.g. prices)
5) Display the final results
This is all working and going at lightning speed, but in order to achieve (3) I've given each result a "relevance" score. This is just a member integer on each search result object. I iterate through the entire result set and update this score accordingly, then order-by it at the end.
The problem I am having is that the "relevance" member is retaining this value from search to search. I assume this is because what I am updating is a reference to the search results in the cache, rather than a new object, so updating it also updates the cached version. What I'm looking for is a tidy solution to get around this. What I've come up with so far is either;
a) Clone the cache when i get it.
b) Create a seperate dictionary to store relevances in and match them up at the end
Am I missing a really obvious and clean solution or should i go down one of these routes? I'm using C# and .net.
Hopefully it should be obvious from the description what I'm getting at, here's some code anyway; this first one is the iteration through the cached results in order to do the filtering;
private List<QuickSearchResult> performFiltering(string keywords, string regions, List<QuickSearchResult> cachedSearchResults)
{
List<QuickSearchResult> filteredItems = new List<QuickSearchResult>();
string upperedKeywords = keywords.ToUpper();
string[] keywordsArray = upperedKeywords.Split(' ');
string[] regionsArray = regions.Split(',');
foreach (var item in cachedSearchResults)
{
//Check for keywords
if (keywordsArray != null)
{
if (!item.ContainsKeyword(upperedKeywords, keywordsArray))
continue;
}
//Check for regions
if (regionsArray != null)
{
if (!item.IsInRegion(regionsArray))
continue;
}
filteredItems.Add(item);
}
return filteredItems.OrderBy(t=> t.Relevance).Take(_maxSearchResults).ToList<QuickSearchResult>();
}
and here is an example of the "IsInRegion" method of the QuickSearchResult object;
public bool IsInRegion(string[] regions)
{
int relevanceScore = 0;
foreach (var region in regions)
{
int parsedRegion = 0;
if (int.TryParse(region, out parsedRegion))
{
foreach (var thisItemsRegion in this.Regions)
{
if (thisItemsRegion.ID == parsedRegion)
relevanceScore += 10;
}
}
}
Relevance += relevanceScore;
return relevanceScore > 0;
}
And basically if i search for "london" i get a score of "10" the first time, "20" the second time...
If you use the NetDataContractSerializer to serialize your objects in the cache, you could use a [DataMember] attribute to control what gets serialized and what doesn't. For instance, you could store your temporarary calculated relevance value in a field that is not serialized.

Iterating through IQueryable with foreach results in an out of memory exception

I'm iterating through a smallish (~10GB) table with a foreach / IQueryable and LINQ-to-SQL.
Looks something like this:
using (var conn = new DbEntities() { CommandTimeout = 600*100})
{
var dtable = conn.DailyResults.Where(dr => dr.DailyTransactionTypeID == 1);
foreach (var dailyResult in dtable)
{
//Math here, results stored in-memory, but this table is very small.
//At the very least compared to stuff I already have in memory. :)
}
}
The Visual Studio debugger throws an out-of memory exception after a short while at the base of the foreach loop. I'm assuming that the rows of dtable are not being flushed. What to do?
The IQueryable<DailyResult> dtable will attempt to load the entire query result into memory when enumerated... before any iterations of the foreach loop. It does not load one row during the iteration of the foreach loop. If you want that behavior, use DataReader.
You call ~10GB smallish? you have a nice sense of humor!
You might consider loading rows in chunks, aka pagination.
conn.DailyResults.Where(dr => dr.DailyTransactionTypeID == 1).Skip(x).Take(y);
Using DataReader is a step backward unless there is a way to use it within LINQ. I thought we were trying to get away from ADO.
The solution suggested above works, but it's truly ugly. Here is my code:
int iTake = 40000;
int iSkip = 0;
int iLoop;
ent.CommandTimeout = 6000;
while (true)
{
iLoop = 0;
IQueryable<viewClaimsBInfo> iInfo = (from q in ent.viewClaimsBInfo
where q.WorkDate >= dtStart &&
q.WorkDate <= dtEnd
orderby q.WorkDate
select q)
.Skip(iSkip).Take(iTake);
foreach (viewClaimsBInfo qInfo in iInfo)
{
iLoop++;
if (lstClerk.Contains(qInfo.Clerk.Substring(0, 3)))
{
/// Various processing....
}
}
if (iLoop < iTake)
break;
iSkip += iTake;
}
You can see that I have to check for having run out of records because the foreach loop will end at 40,000 records. Not good.
Updated 6/10/2011: Even this does not work. At 2,000,000 records or so, I get an out-of-memory exception. It is also excruciatingly slow. When I modified it to use OleDB, it ran in about 15 seconds (as opposed to 10+ minutes) and didn't run out of memory. Does anyone have a LINQ solution that works and runs quickly?
Use .AsNoTracking() - it tells DbEntities not to cache retrieved rows
using (var conn = new DbEntities() { CommandTimeout = 600*100})
{
var dtable = conn.DailyResults
.AsNoTracking() // <<<<<<<<<<<<<<
.Where(dr => dr.DailyTransactionTypeID == 1);
foreach (var dailyResult in dtable)
{
//Math here, results stored in-memory, but this table is very small.
//At the very least compared to stuff I already have in memory. :)
}
}
I would suggest using SQL instead to modify this data.

Out of memory when creating a lot of objects C#

I'm processing 1 million records in my application, which I retrieve from a MySQL database. To do so I'm using Linq to get the records and use .Skip() and .Take() to process 250 records at a time. For each retrieved record I need to create 0 to 4 Items, which I then add to the database. So the average amount of total Items that has to be created is around 2 million.
IQueryable<Object> objectCollection = dataContext.Repository<Object>();
int amountToSkip = 0;
IList<Object> objects = objectCollection.Skip(amountToSkip).Take(250).ToList();
while (objects.Count != 0)
{
using (dataContext = new LinqToSqlContext(new DataContext()))
{
foreach (Object objectRecord in objects)
{
// Create 0 - 4 Random Items
for (int i = 0; i < Random.Next(0, 4); i++)
{
Item item = new Item();
item.Id = Guid.NewGuid();
item.Object = objectRecord.Id;
item.Created = DateTime.Now;
item.Changed = DateTime.Now;
dataContext.InsertOnSubmit(item);
}
}
dataContext.SubmitChanges();
}
amountToSkip += 250;
objects = objectCollection.Skip(amountToSkip).Take(250).ToList();
}
Now the problem arises when creating the Items. When running the application (and not even using dataContext) the memory increases consistently. It's like the items are never getting disposed. Does anyone notice what I'm doing wrong?
Thanks in advance!
Ok I've just discussed this situation with a colleague of mine and we've come to the following solution which works!
int amountToSkip = 0;
var finished = false;
while (!finished)
{
using (var dataContext = new LinqToSqlContext(new DataContext()))
{
var objects = dataContext.Repository<Object>().Skip(amountToSkip).Take(250).ToList();
if (objects.Count == 0)
finished = true;
else
{
foreach (Object object in objects)
{
// Create 0 - 4 Random Items
for (int i = 0; i < Random.Next(0, 4); i++)
{
Item item = new Item();
item.Id = Guid.NewGuid();
item.Object = object.Id;
item.Created = DateTime.Now;
item.Changed = DateTime.Now;
dataContext.InsertOnSubmit(item);
}
}
dataContext.SubmitChanges();
}
// Cumulate amountToSkip with processAmount so we don't go over the same Items again
amountToSkip += processAmount;
}
}
With this implementation we dispose the Skip() and Take() cache everytime and thus don't leak memory!
Ahhh, the good old InsertOnSubmit memory leak. I've encountered it and bashed my head against the wall many times when trying to load data from large CVS files using LINQ to SQL. The problem is that even after calling SubmitChanges, the DataContext continues to track all objects that have been added using InsertOnSubmit. The solution is to SubmitChanges after a certain amount of objects, then create a new DataContext for the next batch. When the old DataContext is garbage collected, so will all the inserted objects that are tracked by it (and that you no longer require).
"But wait!" you say, "Creating and disposing of many DataContext will have a huge overhead!". Well, not if you create a single database connection and pass it to each DataContext constructor. That way, a single connection to the database is maintained throughout, and the DataContext object is otherwise a lightweight object that represents a small work unit and should be discarded after it is complete (in your example, submitting a certain number of records).
My best guess here would be the IQueryable to cause the Memory leak.
Maybe there is no appropriate implementation for MySQL of the Take/Skip methods and it's doing the paging in memory? Stranger things have happened, but your loop looks fine. All references should go out of scope and get garbage collected ..
Well yeah.
So at the end of that loop you'll attempt to have 2 million items in your list, no? Seems to me that the answer is trivial: Store less items or get more memory.
-- Edit:
It's possible I've read it wrong, I'd probably need to compile and test it, but I can't do that now. I'll leave this here, but I could be wrong, I haven't reviewed it carefully enough to be definitive, nevertheless the answer may prove useful, or not. (Judging by the downvote, I guess not :P)
Have you tried declaring the Item outside the loop like this:
IQueryable<Object> objectCollection = dataContext.Repository<Object>();
int amountToSkip = 0;
IList<Object> objects = objectCollection.Skip(amountToSkip).Take(250).ToList();
Item item = null;
while (objects.Count != 0)
{
using (dataContext = new LinqToSqlContext(new DataContext()))
{
foreach (Object objectRecord in objects)
{
// Create 0 - 4 Random Items
for (int i = 0; i < Random.Next(0, 4); i++)
{
item = new Item();
item.Id = Guid.NewGuid();
item.Object = objectRecord.Id;
item.Created = DateTime.Now;
item.Changed = DateTime.Now;
dataContext.InsertOnSubmit(item);
}
}
dataContext.SubmitChanges();
}
amountToSkip += 250;
objects = objectCollection.Skip(amountToSkip).Take(250).ToList();
}

Categories