I'm processing 1 million records in my application, which I retrieve from a MySQL database. To do so I'm using Linq to get the records and use .Skip() and .Take() to process 250 records at a time. For each retrieved record I need to create 0 to 4 Items, which I then add to the database. So the average amount of total Items that has to be created is around 2 million.
IQueryable<Object> objectCollection = dataContext.Repository<Object>();
int amountToSkip = 0;
IList<Object> objects = objectCollection.Skip(amountToSkip).Take(250).ToList();
while (objects.Count != 0)
{
using (dataContext = new LinqToSqlContext(new DataContext()))
{
foreach (Object objectRecord in objects)
{
// Create 0 - 4 Random Items
for (int i = 0; i < Random.Next(0, 4); i++)
{
Item item = new Item();
item.Id = Guid.NewGuid();
item.Object = objectRecord.Id;
item.Created = DateTime.Now;
item.Changed = DateTime.Now;
dataContext.InsertOnSubmit(item);
}
}
dataContext.SubmitChanges();
}
amountToSkip += 250;
objects = objectCollection.Skip(amountToSkip).Take(250).ToList();
}
Now the problem arises when creating the Items. When running the application (and not even using dataContext) the memory increases consistently. It's like the items are never getting disposed. Does anyone notice what I'm doing wrong?
Thanks in advance!
Ok I've just discussed this situation with a colleague of mine and we've come to the following solution which works!
int amountToSkip = 0;
var finished = false;
while (!finished)
{
using (var dataContext = new LinqToSqlContext(new DataContext()))
{
var objects = dataContext.Repository<Object>().Skip(amountToSkip).Take(250).ToList();
if (objects.Count == 0)
finished = true;
else
{
foreach (Object object in objects)
{
// Create 0 - 4 Random Items
for (int i = 0; i < Random.Next(0, 4); i++)
{
Item item = new Item();
item.Id = Guid.NewGuid();
item.Object = object.Id;
item.Created = DateTime.Now;
item.Changed = DateTime.Now;
dataContext.InsertOnSubmit(item);
}
}
dataContext.SubmitChanges();
}
// Cumulate amountToSkip with processAmount so we don't go over the same Items again
amountToSkip += processAmount;
}
}
With this implementation we dispose the Skip() and Take() cache everytime and thus don't leak memory!
Ahhh, the good old InsertOnSubmit memory leak. I've encountered it and bashed my head against the wall many times when trying to load data from large CVS files using LINQ to SQL. The problem is that even after calling SubmitChanges, the DataContext continues to track all objects that have been added using InsertOnSubmit. The solution is to SubmitChanges after a certain amount of objects, then create a new DataContext for the next batch. When the old DataContext is garbage collected, so will all the inserted objects that are tracked by it (and that you no longer require).
"But wait!" you say, "Creating and disposing of many DataContext will have a huge overhead!". Well, not if you create a single database connection and pass it to each DataContext constructor. That way, a single connection to the database is maintained throughout, and the DataContext object is otherwise a lightweight object that represents a small work unit and should be discarded after it is complete (in your example, submitting a certain number of records).
My best guess here would be the IQueryable to cause the Memory leak.
Maybe there is no appropriate implementation for MySQL of the Take/Skip methods and it's doing the paging in memory? Stranger things have happened, but your loop looks fine. All references should go out of scope and get garbage collected ..
Well yeah.
So at the end of that loop you'll attempt to have 2 million items in your list, no? Seems to me that the answer is trivial: Store less items or get more memory.
-- Edit:
It's possible I've read it wrong, I'd probably need to compile and test it, but I can't do that now. I'll leave this here, but I could be wrong, I haven't reviewed it carefully enough to be definitive, nevertheless the answer may prove useful, or not. (Judging by the downvote, I guess not :P)
Have you tried declaring the Item outside the loop like this:
IQueryable<Object> objectCollection = dataContext.Repository<Object>();
int amountToSkip = 0;
IList<Object> objects = objectCollection.Skip(amountToSkip).Take(250).ToList();
Item item = null;
while (objects.Count != 0)
{
using (dataContext = new LinqToSqlContext(new DataContext()))
{
foreach (Object objectRecord in objects)
{
// Create 0 - 4 Random Items
for (int i = 0; i < Random.Next(0, 4); i++)
{
item = new Item();
item.Id = Guid.NewGuid();
item.Object = objectRecord.Id;
item.Created = DateTime.Now;
item.Changed = DateTime.Now;
dataContext.InsertOnSubmit(item);
}
}
dataContext.SubmitChanges();
}
amountToSkip += 250;
objects = objectCollection.Skip(amountToSkip).Take(250).ToList();
}
Related
PROBLEM:
If I put SaveChangesAsync outside of the loop, it changes only the last data which was put with _context.Add(attdef);
Why is that so?
First I thought it's because I have autoIncrement, but when I disabled it, It still did not work.
Using SaveChanges instead of SaveChangesAsync does not fix problem aswell.
But updating data works well.
GameController.cs
for (int i = 0; i < editViewModel.TowerAttack.Count; i++)
{
tower = _context.Tower.First(m => m.TowerId == editViewModel.TowerId[i]);
tower.Attack -= editViewModel.TowerAttack[i];
_context.Update(tower);
attdef.Id = 0; // AutoIncrement
attdef.Amount = attackSum;
_context.Add(attdef);
}
await _context.SaveChangesAsync();
I think you have declared attdef variable somewhere above and in the loop you're updating the same reference and adding it to the context. Due to this, you have single item adding in the context. The better way is to do it something like this
var attdefs = new List<Attdef>();
for (int i = 0; i < editViewModel.TowerAttack.Count; i++)
{
tower = _context.Tower.First(m => m.TowerId == editViewModel.TowerId[i]);
tower.Attack -= editViewModel.TowerAttack[i];
_context.Update(tower);
attdefs.Add(new AttacDef { id = 0, Amount = attackSum }) ;
}
_context.AddRange(attdefs); // don't remember exaxct syntaxt but this should be faster way
await _context.SaveChangesAsync();
Is attdef declared outside the loop? Are you just updating the same object with each loop? I would expect only the latest version of that object to be added if that's the case.
If you're trying to add several new objects, try declaring attdef within the loop so you're working with a new object each time.
Hi i'm trying to setup simple test data.
I simply want to take a collection which is smallish and make it bigger by add itself multiple times.
After I;ve added them together i want to re-number the property LineNumber
so that there are no duplicates and that it goes in order. 1,2,3,4....
no matter what i try it doesn't seem to work and i cant see the mistake.
var sampleTemplateLine = dataContext.TemplateFileLines.ToList();
*//tired this doesnt work either*
//List<TemplateFileLine> lineRange = new List<TemplateFileLine>();
//lineRange.AddRange(sampleTemplateLine);
//lineRange.AddRange(sampleTemplateLine);
//lineRange.AddRange(sampleTemplateLine);
//lineRange.AddRange(sampleTemplateLine);
var allProducts = sampleTemplateLine
.Concat(sampleTemplateLine)
.Concat(sampleTemplateLine)
.Concat(sampleTemplateLine)
.ToList();
int i = 1;
foreach (var item in allProducts)
{
item.LineNumber = i;
i++;
}
this doesnt seem to work either
//re-number the line number
var total = allProducts.Count();
for (int i =0; i < total; i++)
{
allProducts[i].LineNumber = i+1;
}
PROBLEM: below RETURN 4 when i'm expecting 1
var itemthing = allProducts.Where(x => x.LineNumber == 17312).ToList();
You are adding the same objects multiple times. You wold have to add new objects or clone the ones you have.
The problem is they are pointing the same object. So if you change a property it changes all the pointed objects at the same
You can use Clone method if it exist, if not you can create your own Clone method like in this question.
The part of the code I'm working on receives an
IEnumerable<T> items
where each item contains a class with properties reflecting a MSSQL database table.
The database table has a total count of 953664 rows.
The dataset in code is filtered down to a set of 284360 rows.
The following code throws an OutOfMemoryException when the process reaches about 1,5 GB memory allocation.
private static void Save<T>(IEnumerable<T> items, IList<IDataWriter> dataWriters, IEnumerable<PropertyColumn> columns) where T : MyTableClass
{
foreach (var item in items)
{
}
}
The variable items is of type
IQueryable<MyTableClass>
I can't find anyone with the same setup, and other's solutions that I've found doesn't apply here.
I've also tried paging, using Skip and Take with a page size of 500, but that just takes a long time and ends up with the same result. It seems like objects aren't being released after each iteration. How is that?
How can I rewrite this code to cope with a larger collection set?
Well, as Servy has already said you didn't provide your code so I'll try to make some predictions... (Sorry for my english)
If you have an exception in "foreach (var item in items)" when you are using paging then, I guess, something wrong with paging. I wrote a couple of examples to explain my idea.
if first example I suggest you (just for test) put your filter inside the Save function.
private static void Save<T>(IQueryable<T> items, IList<IDataWriter> dataWriters, IEnumerable<PropertyColumn> columns) where T : MyTableClass
{
int pageSize = 500; //Only 500 records will be loaded.
int currentStep = 0;
while (true)
{
//Here we create a new request into the database using our filter.
var tempList = items.Where(yourFilter).Skip(currentStep * pageSize).Take(pageSize);
foreach (var item in tempList)
{
//If you have an exception here maybe something wrong in your dataWriters or columns.
}
currentStep++;
if (tempList.Count() == 0) //No records have been loaded so we can leave.
break;
}
}
The second example show how to use paging without any changes in the Save function
int pageSize = 500;
int currentStep = 0;
while (true)
{
//Here we create a new request into the database using our filter.
var tempList = items.Where(yourFilter).Skip(currentStep * pageSize).Take(pageSize);
Save(tempList, dataWriters, columns); //Calling saving function.
currentStep++;
if (tempList.Count() == 0)
break;
}
Try both of them and you'll either resolve your problem or find another place where an exception is raised.
By the way, another potential place is your dataWriters. I guess there you store all data that your have been received from the database. Maybe you shouldn't save all data? Just calculate memory size that all objects are required.
P.S. And don't use while(true) in your code. It just an example:)
I'm iterating through a smallish (~10GB) table with a foreach / IQueryable and LINQ-to-SQL.
Looks something like this:
using (var conn = new DbEntities() { CommandTimeout = 600*100})
{
var dtable = conn.DailyResults.Where(dr => dr.DailyTransactionTypeID == 1);
foreach (var dailyResult in dtable)
{
//Math here, results stored in-memory, but this table is very small.
//At the very least compared to stuff I already have in memory. :)
}
}
The Visual Studio debugger throws an out-of memory exception after a short while at the base of the foreach loop. I'm assuming that the rows of dtable are not being flushed. What to do?
The IQueryable<DailyResult> dtable will attempt to load the entire query result into memory when enumerated... before any iterations of the foreach loop. It does not load one row during the iteration of the foreach loop. If you want that behavior, use DataReader.
You call ~10GB smallish? you have a nice sense of humor!
You might consider loading rows in chunks, aka pagination.
conn.DailyResults.Where(dr => dr.DailyTransactionTypeID == 1).Skip(x).Take(y);
Using DataReader is a step backward unless there is a way to use it within LINQ. I thought we were trying to get away from ADO.
The solution suggested above works, but it's truly ugly. Here is my code:
int iTake = 40000;
int iSkip = 0;
int iLoop;
ent.CommandTimeout = 6000;
while (true)
{
iLoop = 0;
IQueryable<viewClaimsBInfo> iInfo = (from q in ent.viewClaimsBInfo
where q.WorkDate >= dtStart &&
q.WorkDate <= dtEnd
orderby q.WorkDate
select q)
.Skip(iSkip).Take(iTake);
foreach (viewClaimsBInfo qInfo in iInfo)
{
iLoop++;
if (lstClerk.Contains(qInfo.Clerk.Substring(0, 3)))
{
/// Various processing....
}
}
if (iLoop < iTake)
break;
iSkip += iTake;
}
You can see that I have to check for having run out of records because the foreach loop will end at 40,000 records. Not good.
Updated 6/10/2011: Even this does not work. At 2,000,000 records or so, I get an out-of-memory exception. It is also excruciatingly slow. When I modified it to use OleDB, it ran in about 15 seconds (as opposed to 10+ minutes) and didn't run out of memory. Does anyone have a LINQ solution that works and runs quickly?
Use .AsNoTracking() - it tells DbEntities not to cache retrieved rows
using (var conn = new DbEntities() { CommandTimeout = 600*100})
{
var dtable = conn.DailyResults
.AsNoTracking() // <<<<<<<<<<<<<<
.Where(dr => dr.DailyTransactionTypeID == 1);
foreach (var dailyResult in dtable)
{
//Math here, results stored in-memory, but this table is very small.
//At the very least compared to stuff I already have in memory. :)
}
}
I would suggest using SQL instead to modify this data.
Let's look at this code:
IList<IHouseAnnouncement> list = new List<IHouseAnnouncement>();
var table = adapter.GetData(); //get data from repository object -> DataTable
if (table.Rows.Count >= 1)
{
for (int i = 0; i < table.Rows.Count; i++)
{
var anno = new HouseAnnouncement();
anno.Area = float.Parse(table.Rows[i][table.areaColumn].ToString());
anno.City = table.Rows[i][table.cityColumn].ToString();
list.Add(anno);
}
}
return list;
Is it better way to write this in less code and better fashion (must be :-) )? Maybe using lambda (but let me know how)?
Thanks in advance!
Just FYI, you're never adding the new HouseAnnouncement to your list, and your loop will never execute for the last row, but I'm assuming those are errors in the example rather than in your actual code.
You could do something like this:
return adapter.GetData().Rows.Cast<DataRow>().Select(row =>
new HouseAnnouncement()
{
Area = Convert.ToSingle(row["powierzchnia"]),
City = (string)row["miasto"],
}).ToList();
I usually go for readability over brevity, but I feel like this is pretty readable.
Note that while you could still cache the DataTable and use table.powierzchniaColumn in the lambda, I eliminated that so that you didn't use a closure that wasn't really necessary (closures introduce substantial complexity to the internal implementation of the lambda, so I avoid them if possible).
If it's important to you to keep the column references as they are, then you can do it like this:
using (var table = adapter.GetData())
{
return table.Rows.Cast<DataRow>().Select(row =>
new HouseAnnouncement()
{
Area = Convert.ToSingle(row[table.powierzchniaColumn]),
City = (string)row[table.miastoColumn],
}).ToList();
}
This will add complexity to the actual IL that the compiler generates, but should do the trick for you.
You could do something like this in Linq:
var table = adapter.GetData();
var q = from row in table.Rows.Cast<DataRow>()
select new HouseAnnouncement()
{ Area = float.Parse(row[table.areaColumn].ToString()),
City = row[table.cityColumn].ToString()
};
return q.ToList();
Your "if statement" is not necessary. Your "for loop" already takes care of that case.
Also, your "for loop" will not execute when the number of your Table Rows is 1. This seems like a mistake, and not by design, but I could be wrong. If you want to fix this, just take out the "-1":
for (int i = 0; i < table.Rows.Count; i++)
Well, for one thing, you appear to have an off-by-one error:
for (int i = 0; i < table.Rows.Count - 1; i++)
{
}
If your table has three rows, this will run while i is less than 3 - 1, or 2, which means it'll run for rows 0 and 1 but not for row 2. This may not be what you intend.
Can't go much simpler that one for-loop and no if-statements:
var table = adapter.GetData(); //get data from repository object -> DataTable
IList<IHouseAnnouncement> list = new List<IHouseAnnouncement>(table.Rows.Count);
for (int i = 0; i < list.Length; i++)
{
list[i] = new HouseAnnouncement();
list[i].Area = float.Parse(table.Rows[i][table.areaColumn].ToString());
list[i].City = table.Rows[i][table.cityColumn].ToString();
}
return list;
It takes more characters than linq-version, but is parsed faster by programmer's brain. :)
Readability is, to me, preferable to being succinct with your code--as long as performance is not a victim. Also, I am sure that anyone who later has to maintain the code will appreciate it as well.
Even when I am maintaining my own code, I don't want to look at it, say a couple of months later, and think "what the hell was I trying to accomplish"
I might do something like this:
var table = adapter.GetData(); //get data from repository object -> DataTable
return table.Rows.Take(table.Rows.Count-1).Select(row => new HouseAnnouncement() {
Area = float.Parse(row[table.powierzchniaColumn].ToString()),
City = row[table.miastoColumn].ToString()
}).ToList();