Debuging memory leak VS2015. Massive native heap - c#

I am struggling with memory issue which I can definitely see but I don't know where and when exactly it's happening.
My managed heap size seems to be ok (100MB), but native heap size is starting to growing in unknown moment and it's still going until it will reach ~2GB and app is crashing.
My application is running many threads, and it's doing a lot of Db connection through EF 6 in many loops.
That's why it's really hard for me to debug code just by looking to logs or putting break points.
I thought maybe I can see what's the issue by looking at memory but what only I can see it's that my native heap size is mostly filled by objects with size of 8,192 bytes. So I can see that problem is really happening but still have no clue why.
I am not sure if I am using 100% capabilities of Visual Studio memory profiler.
What I can see now is:
What else or more I can do to find the issue?
Maybe it's silly question but I am working on this problem for two days and I've almost reached my ideas limit.
I've went through Break points, logs, code analyze but I am still without any clue.
I will be grateful for any idea.
[EDIT] 15:11 2017/02/03
I was able to find code responsible for the leak, but it still has no sense for me. How it's possible that this code is causing massive memory leak?
The code is :
public class DbData : IDisposable
{
private DBEntity db;
public DbData()
{
db = new FruitDBEntity();
}
public Fruit AddFruitDefinition(Fruit fruit)
{
lock (thisLock)
{
var newFruit = db.Fruits.Where(f => f.FruitId == fruit.FruitId)
.Where(f => f.FruitName == fruit.FruitName)
.Where(f => f.FruitColor == fruit.FruitColor)
.FirstOrDefault();
if (newFruit == null)
{
newFruit = db.Fruits.Add(fruit);
db.SaveChanges();
}
return newFruit;
}
}
}
Class DbData is created every time I want to use method AddFruitDefinition():
using ( var data = new DbData() )
{
data.AddFruitDefinition();
}

First, you need at least two snapshots.
As far as I can see (from image) you took only one snapshot.
What to do?
Start application with profiler.
Do usual steps and take snapshot.
Repeat same steps which you did in step 2, and take another snapshot.
Stop application. You should see 2 snapshots, click on 2ns snapshot and select Compare to #Snapshot 1.
It might take some time to process results.
You should be able to see few additional columns in report (Identifier, count, Size, Module, Count Diff. Size Diff.
Last two columns are important. They tell which class used more/less memory second time.
To sum up... you need to figure out where is memory leak and to fix it. And you'll do it by comparing snapshots.

Related

Console app quits before foreach loop finishes (C#)

This console app ends before reaching the end, i know that because it usually scans a whole Table in a DB with 5000-10000 rows.
I have no idea as to why it quits halfway, because it has no exeptions or any information (it just shows no information in the logs it just "stops"), and after aproximatly 10-15 mins it abruply stops and exits like everything is fine and it just reached the end of it. When i check the db only 1000-2000 registers have been worked on.
The Code : (Rather simple function any clarification ask and i shall give it, not a reproductive example because its a big development but i will point out what type of information everything is with as much detail as possible)
private List<Item> artigos;
public void VerifyID()
{
//a list cointaining all the rows in the db
artigos = itemCore.GetAllSEAAll();
foreach (var artigo in artigos.ToList())
{
try
{
//Code Here
}
catch(Exception ex)
{
logger.LogInformation(ex.ToString());
}
}
}
TL:DR on code :
I get 5000-10000 rows from a table.
I check a specific field on it and compare to a value (irrelevant for the question) that comes from an api.
I update the db in order to register any anomalies.
(Updated to include only needed code)
It will reproduce the problem i have its just least predictable as to how long until it stops, I am pretty sure it HAS something to do with Garbace Collector or Memory Management
I have so far tried to stop the garbage collector from collecting my variable (artigos).
Resulting is a few (3-5) results of (Object not Initialized) before it quits.
From debbuging i managed to figure out that List artigos, at some point stops existing, GC.KeepAlive() does nothing.
I have searched the web for a similar issue without any luck.
That being and my lack of experience i am currently stuck on what to do, or even what "path" to take to figure out, any help is welcome and sorry if i am missing anything im not familiar with StackOverflow structure
I need to make this clear, i have debuged this program more then i care to remember.
It is not my code that sets artigos to null.
There is no exeption being thown.

seeking possible memory leak with C#/C++ wrapper

I have a C# program that calls a C++ DLL. The wrapper code for the function is question is shown below.
As this function is called repeatedly, memory use continues to grow and it appears as if there's a memory leak. And it appears that the issue is associated with the matchingFragments->Add line in the code below. If I comment out that line, memory use is stable.
In previous iterations of this program, where matchingFragments wasn't a list but was set to a fixed number of elements, memory use would be stable throughout repeated calls to this function. So I suspect some memory isn't being freed somewhere, but I don't know what the issue is, whether it's matchedFragments, returnedFragments, or neither. Nor do I know any of this well enough (I'm a C developer struggling with this) to know how to debug this, so any suggestions would be appreciated.
bool SearchWrapper::SpectrumSearch([Out] List<FragmentWrapper^>^% returnedFragments)
{
vector<Fragment> matchedFragments;
// perform the search
bool isSuccess = _pSearchMgr->PeptideSearch(matchedFragments);
// Convert data back to the managed world
returnedFragments = gcnew List<FragmentWrapper^>();
for (auto frag : matchedFragments)
{
returnedFragments->Add(gcnew FragmentWrapper(frag));
}
return isSuccess;
}
Turns out the actual fix to my issue was the need for a finalizer for the FragmentWrapper class. There was a destructor but not a finalizer. Once I added the finalizer, the memory leak went away.

Memory consumption when initializing object

I am trying to build some objects and insert them into a database. The number of records that have to be inserted is big ~ millions.
The insert is done in batches.
The problem I am having is that i need to initialize new objects to add them to a list and at the end, i do a bulk insert into the database of the list. Because i am initializing a huge number of objects, my computer memory(RAM) gets filled up and it kinda freezes everything.
The question is :
From a memory point of view, should I initialize objects of set them to null ?
Also, I am trying to work with the same object reference. Am i doing it right ?
Code:
QACompleted completed = new QACompleted();
QAUncompleted uncompleted = new QAUncompleted();
QAText replaced = new QAText();
foreach (QAText question in questions)
{
MatchCollection matchesQ = rgx.Matches(question.Question);
MatchCollection matchesA = rgx.Matches(question.Answer);
foreach (GetKeyValues_Result item in values)
{
hasNull = false;
replaced = new QAText(); <- this object
if (matchesQ.Count > 0)
{
SetQuestion(matchesQ, replaced, question, item);
}
else
{
replaced.Question = question.Question;
}
if (matchesA.Count > 0)
{
SetAnswer(matchesA,replaced,question,item);
}
else
{
replaced.Answer = question.Answer;
}
if (!hasNull)
{
if (matchesA.Count == 0 && matchesQ.Count == 0)
{
completed = new QACompleted(); <- this object
MapEmpty(replaced,completed, question.Id);
}
else
{
completed = new QACompleted(); <- this object
MapCompleted(replaced, completed, question.Id, item);
}
goodResults.Add(completed);
}
else
{
uncompleted = new QAUncompleted(); <- this object
MapUncompleted(replaced,uncompleted,item, question.Id);
badResults.Add(uncompleted);
}
}
var success = InsertIntoDataBase(goodResults, "QACompleted");
var success1 = InsertIntoDataBase(badResults, "QAUncompleted");
}
I have marked the objects. Should I just call them like replaced = NULL, or should i use the constructor ?
What would be the difference between new QAText() and = null ?
The memory cost of creating objects
Creating objects in C# will always have a memory cost. This relates to the memory layout of object. Assuming you are using 64 bit OS, the runtime has to allocate an extra 8 bytes for sync block, and 8 bytes for method table pointer. After the sync block and method table pointer are your customized data fields. Besides the inevitable 16 bytes header, objects are always aligned to the boundary of 8 bytes and therefore can incur extra overhead.
You can roughly estimate the memory overhead if you know exactly what is the number of objects you create. However I would suggest you be careful when assuming that your memory pressure is coming from object layout overhead. This is also the reason I suggest you estimate the overhead as the first step. You might end up realizing that even if the layout overhead can magically be completely removed, you are not going to make a huge difference in terms of memory performance. After all, for a million objects, the overhead of object header is only 16 MB.
The difference between replaced = new QAText() and replaced = null
I suppose after you set replaced to null you still have to create another QAText()? If so, memory-wise there is no real difference to the garbage collector. The old QAText instance will be collected either way if you are not making any other reference to it. When to collect the instance, however, is the call of garbage collector. Doing replaced = null will not make the GC happen earlier.
You can try to reuse the same QAText instance instead of creating a new one every time. But creating a new one every time will not result in high memory pressure. It will make the GC a little busier therefore result in a higher CPU usage.
Identify the real cause for high memory usage
If your application is really using a lot of memory, you have to look at the design of your QACompleted and QAUncompleted objects. Those are the objects added to the list and occupy memory until you submit them to the database. If those objects are designed well(they are only taking the memory they have to take), as Peter pointed out you should use a smaller batch size so you don't have to keep too many of them in memory.
There are other factors in your program that can possible cause unexpected memory usage. What is the data structure for goodResults and badResults? Are they List or LinkedList? List internally is nothing but a dynamic array. It uses a grow policy which will always double its size when it is full. The always-double policy can eat up memory quickly especially when you have a lot of entries.
LinkedList, on the other side, does not suffer from the above-mentioned problem. But every single node requires roughly 40 extra bytes.
It also worth-checking what MapCompleted and MapUnCompleted methods are doing. Are they making long-lived reference to replaced object? If so it will cause a memory leak.
As a summary, when dealing with memory problems, you should focus on macro-scope issues such as the choice of data structures, or memory leaks. Or optimize your algorithms so that you don't have to keep all the data in memory all the time.
Instantiating new (albeit empty) object always takes some memory, as it has to allocate space for the object's fields. If you aren't going to access or set any data in the instance, I see no point in creating it.
It's unfortunate that the code example is not written better. There seem to be lots of declarations left out, and undocumented side-effects in the code. This makes it very hard to offer specific advice.
That said…
Your replaced object does not appear to be retained beyond one iteration of the loop, so it's not part of the problem. The completed and uncompleted objects are added to lists, so they do add to your memory consumption. Likewise the goodResults and badResults lists themselves (where are the declarations for those?).
If you are using a computer with too little RAM, then yes...you'll run into performance issues as Windows uses the disk to make up for the lack of RAM. And even with enough RAM, at some point you could run into .NET's limitations with respect to object size (i.e. you can only put so many elements into a list). So one way or the other, you seem to need to reduce your peak memory usage.
You stated that when the data in the lists is inserted into the database, the lists are cleared. So presumably that means that there are so many elements in the values list (one of the undeclared, undocumented variables in your code example) that the lists and their objects get too large before getting to the end of the inner loop and inserting the data into the database.
In that case, then it seems likely the simplest way to address the issue is to submit the updates in batches inside the inner foreach loop. E.g. at the end of that loop, add something like this:
if (goodResults.Count >= 100000)
{
var success = InsertIntoDataBase(goodResults, "QACompleted");
}
if (badResults.Count >= 100000)
{
var success = InsertIntoDataBase(badResults, "QACompleted");
}
(Declaring the actual cut-off as a named constant of course, and handling the database insert result return value as appropriate).
Of course, you would still do the insert at the end of the outer loop too.

C# Unable to clear memory of large generic collection

i am putting 2 very large datasets into memory, performing a join to filter out a subset from the first collection and then attempting to destroy the second collection as it uses approximately 600MB of my system's RAM. The problem is that the code below is not working. After the code below runs, a foreach loop runs and takes about 15 mins. During this time the memory does NOT reduce from 600MB+. Am i doing something wrong?
List<APPLES> tmpApples = dataContext.Apples.ToList(); // 100MB
List<ORANGES> tmpOranges = dataContext.Oranges.ToList(); // 600MB
List<APPLES> filteredApples = tmpApples
.Join(tmpOranges, apples => apples.Id, oranges => oranges.Id, (apples, oranges) => apples).ToList();
tmpOranges.Clear();
tmpOranges = null;
GC.Collect();
Note i re-use tmpApples later so i am not clearing it just now..
A few things to note:
Unless your dataContext can be cleared / garbage collected, that may well be retaining references to a lot of objects
Calling Clear() and then setting the variable to null is pointless, if you're really not doing anything else with the list. The GC can tell when you're not using a variable any more, in almost all cases.
Presumably you're judging how much memory the process has reserved; I don't think the CLR will actually return memory to the operating system, but the memory which has been freed by garbage collection will be available to further uses within the CLR. (EDIT: As per comments below, it's possible that the CLR frees areas of the Large Object Heap, but I don't know for sure.)
Clearing, nullifying and collecting hardly ever has any (positive) effect. The GC will automatically detect when objects are not referenced anymore. Further more, As long as the Join operation runs, both the tmpApples and tmpOranges collections are referenced and with it all their objects. They can therefore not be collected.
A better solution would be to do the filter in the database:
// NOTE That I removed the ToList operations
IQueryable<APPLE> tmpApples = dataContext.Apples;
IQueryable<ORANGE> tmpOranges = dataContext.Oranges;
List<APPLES> filteredApples = tmpApples
.Join(tmpOranges, apples => apples.Id,
oranges => oranges.Id, (apples, oranges) => apples)
.ToList();
The reason this data is not collected back is because although you are clearing the collection (hence collection does not have a reference to items anymore),DataContext keeps a reference and this causes it to stay in memory.
You have to dispose your DataContext as soon as you are done.
UPDATE
OK, you probably have fallen victim to large object issue.
Assuming this as Large Object Heap issue you could try to not retrieve all apples at once but instead get them in "packets". So instead of calling
List<APPLE> apples = dataContext.Apples.ToList()
instead try to store the apples in separate lists
int packetSize = 100;
List<APPLE> applePacket1 = dataContext.Apples.Take(packetSize);
List<APPLE> applePacket2 = dataContext.Applies.Skip(packetSize).Take(packetSize);
Does that help?
Use some profiler tools or SOS.dll to find out, where your memory belongs to. If some operations take TOO much time, this sounds like you are swapping out to page file.
EDIT: Also keep in mind, the Debug version will delay the collection of local variables which are not referenced anymore for easier investigation.
The only thing you're doing wrong is explicitly calling the Garbage collector. You don't need to do this (in fact you shouldn't) and as Steven says you don't need to do anything to the collections anyway they'll just go away - eventually.
If you're concern is the performance of the 15 minute foreach loop perhaps it is that loop which you should post. It is probably not related to the memory usage.

Threaded code execution time rises slowly. How to determine the culprit?

I have some code in a thread. The code's main function is to call other methods, which write stuff to a SQL database like this:
private void doWriteToDb()
{
while (true)
{
try
{
if (q.Count == 0) qFlag.WaitOne();
PFDbItem dbItem = null;
lock (qLock)
{
dbItem = q.Dequeue();
}
if (dbItem != null)
{
System.Diagnostics.Stopwatch sw = System.Diagnostics.Stopwatch.StartNew();
//write it off
PFResult result = dbItem.Result;
double frequency = dbItem.Frequency;
int i = dbItem.InChannel;
int j = dbItem.OutChannel;
long detail, param, reading, res;
detail = PFCoreMethods.AddNewTestDetail(DbTables, _dicTestHeaders[result.Name.ToString().ToLower()]);
param = PFCoreMethods.AddNewTestParameter(DbTables, detail, "Frequency");
PFCoreMethods.AddNewTestParameterValue(DbTables, param, frequency.ToString());
param = PFCoreMethods.AddNewTestParameter(DbTables, detail, "In channel");
PFCoreMethods.AddNewTestParameterValue(DbTables, param, i.ToString());
param = PFCoreMethods.AddNewTestParameter(DbTables, detail, "Out channel");
PFCoreMethods.AddNewTestParameterValue(DbTables, param, j.ToString());
param = PFCoreMethods.AddNewTestParameter(DbTables, detail, "Spec");
PFCoreMethods.AddNewTestParameterValue(DbTables, param, result.Spec);
dbItem.Dispose();
dqcnt++;
sw.Stop();
}
}
catch (Exception ex)
{
}
}
}
The AddNewTestParameter method is using a 3rd party class which has the SQL code. Currently I have no access to its internals.
DbTables is a collection object, whose properties are table objectscreated by the 3rd party program. There is only one DbTable object which the program uses.
The problem is that as time passes (couple of hours) the AddNewTestParameter method call takes longer and longer, starting from about 10ms to about 1sec.
The q, is a queue with objects that contain the necessary information to write into the database. The items are added to this queue by the main thread. THe thread simply takes them out, writes them, and disposes of them. The q.Count is no more than 1, although in time as the database writes become slower, the q.Count rises since dequeueing cannot catch up. At its worst, the q.Count was over 30,000. I write over 150,000 entries to the database in total.
On the SQL end, I ran some traces on the server, and the trace shows that internally SQL always takes about 10ms, even during the time the C# code itself takes 1sec.
So, currently, I have 2 suspicions:
My code is the problem. The thread is low-priority, perhaps this might affect performance. Also, after watching the memory usage for 20 minutes, I see that it rises at about 100K/min, CPU usage seems constant around %2-5. How can I figure out where the memory leak happens? Can I pinpoint it to a specific part of the code?
The 3rd party code is the problem. How could I go about proving this? What methods are there to watch and confirm that the problem lies in 3rd party code?
Anyway, if I had to make a suggestion I would look at DBTables ... if that's a collection maybe you're forgetting to reset it so everytime you call it it has one more element... so after a while the 3rd party routine that's O(n^2), or something like that, starts to degrade because it's expecting a worst case scenario of 20 tables and you're providing 1000.
Edit: Ok, I would discard the problem being in the queue as dequeuing should be a really fast operation (you can measure it anyway). It still points to the DBTables collection growing bigger and bigger, have you check its size after the first x iterations?
Edit2: Ok, another approach, let's say the AddNewTestParameter does exactly what is says it does... ADD a new parameter that then gets added to an internal collection. Now, there are two options, if that's the case, either you're supposed to clear that collection by calling the "ClearParameters" function after each iteration and then it will be your fault, or you have not such functionality and then it's 3rd code fault. That would explain also your memory loses (altough that can also be related to the growing queue)

Categories