yield return - memory optimization

yield return - memory optimization - c#

And yet another question about yield return
So I need to execute remotely different SQL scripts. The scripts are in TFS so I get them from TFS automatically and the process iterates through all the files reading their content in memory and sending the content to the remote SQL servers.
So far the process works flawlessly. But now some of the scripts will contain bulk inserts increasing the size of the script to 500,000 MB or more.
So I built the code "thinking" that I was reading the content of the file once in memory but now I have second thoughts.
This is what I have (over simplified):
public IEnumerable<SqlScriptSummary> Find(string scriptsPath)
{
if (!Directory.Exists(scriptsPath))
{
throw new DirectoryNotFoundException(scriptsPath);
}
var path = new DirectoryInfo(scriptsPath);
return path.EnumerateFiles("*.sql", SearchOption.TopDirectoryOnly)
.Select(x =>
{
var script = new SqlScriptSummary
{
Name = x.Name,
FullName = x.FullName,
Content = File.ReadAllText(x.FullName, Encoding.Default)
};
return script;
});
}
....
public void ExecuteScripts(string scriptsPath)
{
foreach (var script in Find(scriptsPath))
{
_scriptRunner.Run(script.Content);
}
}
My understanding is that EnumerateFiles will yield return each file at a time, so that's what made me "think" that I was loading one file at a time in memory.
But...
Once that I'm iterating them, in the ExecuteScripts method what happens with the script variable used in the foreach loop after it goes out of scope? Is that disposed? or does it remain in memory?
If it remains in memory that means that even when I'm using iterators and internally using yield return when I iterate through all of them they are still in memory right? so at the end it would be like using ToList just with a lazy execution is that right?
If the script variable is disposed when it goes out of scope then I think I would be fine
How could I re-design the code to optimize memory consumption, like forcing just to load the content of a script into memory one at a time
Additional questions:
How can I test (unit/integration test) that I'm loading just one script at a time in memory?
How can I test (unit/integration test) that each script is released/not released from memory?

Once that I'm iterating them, what happens with the script variable used in the foreach loop after it goes out of scope? Is that disposed? or does it remain in memory?
If you mean in the ExecuteScripts method - there's nothing to dispose, unless SqlScriptSummary implements IDisposable, which seems unlikely. However, there are two different things here:
The script variable goes out of scope after the foreach loop, and can't act as a GC root
Each object that the script variable has referred to will be eligible for garbage collection when nothing else refers to it... including script on the next iteration.
So yes, basically that should be absolutely fine. You'll be loading one file at a time, and I can't see any reason why there's be more than one file's content in memory at a time, in terms of objects that the GC can't collect. (The GC itself is lazy, so it's unlikely that there'd be exactly one script in memory at a time, but you don't need to worry about that side of things, as your code makes sure that it doesn't keep live references to more than one script at a time.)
The way you can test that you're only loading a single script at a time is to try it with a large directory of large scripts (that don't actually do anything). If you can process more scripts than you have memory, you're fine :)

Related

How to use Roslyn C# scripting in batch processing with several scripts?

I am writing multi-threaded solution that will be used for transferring data from different sources to a central database. Solution, in general, has two parts:
Single-threaded Import engine
Multi-threaded client that invokes Import engine in threads.
In order to minimize custom development I am using Roslyn scripting. This feature is enabled with Nuget Package manager in Import engine project.
Every import is defined as transformation of input table – that has collection of input fields – to destination table – again with collection of destination fields.
Scripting engine is used here to allow custom transformation between input and output. For every input/output pair there is text field with custom script. Here is simplified code used for script initialization:
//Instance of class passed to script engine
_ScriptHost = new ScriptHost_Import();
if (Script != "") //Here we have script fetched from DB as text
{
try
{
//We are creating script object …
ScriptObject = CSharpScript.Create<string>(Script, globalsType: typeof(ScriptHost_Import));
//… and we are compiling it upfront to save time since this might be invoked multiple times.
ScriptObject.Compile();
IsScriptCompiled = true;
}
catch
{
IsScriptCompiled = false;
}
}
Later we will invoke this script with:
async Task<string> RunScript()
{
return (await ScriptObject.RunAsync(_ScriptHost)).ReturnValue.ToString();
}
So, after import definition initialization, where we might have any number of input/output pair description along with script object, memory foot print increases approximately 50 MB per pair where scripting is defined.
Similar usage pattern is applied to validation of destination rows before storing it to a DB (every field might have several scripts that are used to check validity of data).
All in all, typical memory footprint with modest transformation/validation scripting is 200 MB per thread. If we need to invoke several threads, memory usage will be very high and 99% will be used for scripting.
If Import engine is enclosed in WCF based middle layer (which I did) quickly we stumble upon "Insufficient memory" problem.
Obvious solution would be to have one scripting instance that would somehow dispatch code execution to specific function inside the script depending on the need (input/output transformation, validation or something else). I.e. instead of script text for every field we will have SCRIPT_ID that will be passed as global parameter to script engine. Somewhere in script we need to switch to specific portion of code that would execute and return appropriate value.
Benefit of such solution should be considerably better memory usage. Drawback the fact that script maintenance is removed from specific point where it is used.
Before implementing this change, I would like to hear opinions about this solution and suggestions for different approach.

As it seems - using scripting for the mission might be a wasteful overkill - you use many application layers and the memory gets full.
Other solutions:
How do you interface with the DB? you can manipulate the query itself according to your needs instead of writing a whole script for that.
How about using Generics? with enough T's to fit your needs:
public class ImportEngine<T1,T2,T3,T3,T5>
Using Tuples (which is pretty much like using generics)
But if you still think scripts is the right tool for you, I found that the memory usage of scripts can be lowered by running the script work inside your application, (and not with RunAsync), you can do this be getting back from RunAsync the logic, and re-use it, instead of doing the work inside the heavy and memory wasteful RunAsync. Here is an example:
Instead of simply (the script string):
DoSomeWork();
You can do this (IHaveWork is an interface defined in you app, with only one method Work):
public class ScriptWork : IHaveWork
{
Work()
{
DoSomeWork();
}
}
return new ScriptWork();
This way you call the heavy RunAsync only for short period, and it is returning a worker that you can re-use inside your application (and you can of course extend this by adding parameters to the Work method and inherit logic from your application and so on...).
The pattern also breaking the isolation between your app and the script, so you can easily give and get data from the script.
EDIT
Some quick benchmark:
This code:
static void Main(string[] args)
{
Console.WriteLine("Compiling");
string code = "System.Threading.Thread.SpinWait(100000000); System.Console.WriteLine(\" Script end\");";
List<Script<object>> scripts = Enumerable.Range(0, 50).Select(num =>
CSharpScript.Create(code, ScriptOptions.Default.WithReferences(typeof(Control).Assembly))).ToList();
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced); // for fair-play
for (int i = 0; i < 10; i++)
Task.WaitAll(scripts.Select(script => script.RunAsync()).ToArray());
}
Consumes about ~600MB in my environment (just referenced the System.Windows.Form in the ScriptOption for sizing the scripts).
It reuse the Script<object> - it's not consuming more memory on second call to RunAsync.
But we can do better:
static void Main(string[] args)
{
Console.WriteLine("Compiling");
string code = "return () => { System.Threading.Thread.SpinWait(100000000); System.Console.WriteLine(\" Script end\"); };";
List<Action> scripts = Enumerable.Range(0, 50).Select(async num =>
await CSharpScript.EvaluateAsync<Action>(code, ScriptOptions.Default.WithReferences(typeof(Control).Assembly))).Select(t => t.Result).ToList();
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
for (int i = 0; i < 10; i++)
Task.WaitAll(scripts.Select(script => Task.Run(script)).ToArray());
}
In this script, I'm simplifying a bit the solution I proposed to returning Action object, but i think the performance impact is small (but on real implementations I really think you should use your own interface to make it flexible).
When the script is running, you can see a steep rise in memory to ~240MB, but after I'm calling the garbage collector (for demonstration purpose, and I did the same on the previous code) the memory usage drops back to ~30MB. It also faster.

I am not sure whether this existed at the time of question creation but there is something very similar and, let's say, official way how to run scripts multiple times without increasing program memory. You need to use CreateDelegate method that will do exactly what is expected.
I will post it here just for the convenience:
var script = CSharpScript.Create<int>("X*Y", globalsType: typeof(Globals));
ScriptRunner<int> runner = script.CreateDelegate();
for (int i = 0; i < 10; i++)
{
Console.WriteLine(await runner(new Globals { X = i, Y = i }));
}
It takes some memory initially, but keep runner in some global list and invoke it later quickly.

Understanding lazy loading optimization in C#

After reading a bit of how yield, foreach, linq deferred execution and iterators work in C#. I decided to give it a try optimizing an attribute based validation mechanic inside a small project. The result:
private IEnumerable<string> GetPropertyErrors(PropertyInfo property)
{
// where Entity is the current object instance
string propertyValue = property.GetValue(Entity)?.ToString();
foreach (var attribute in property.GetCustomAttributes().OfType<ValidationAttribute>())
{
if (!attribute.IsValid(propertyValue))
{
yield return $"Error: {property.Name} {attribute.ErrorMessage}";
}
}
}
// inside another method
foreach(string error in GetPropertyErrors(property))
{
// Some display/insert log operation
}
I find this slow but that also could be due to reflection or a large amount of properties to process.
So my question is... Is this optimal or a good use of the lazy loading mechanic? or I'm missing something and just wasting tons of resources.
NOTE: The code intention itself is not important, my concern is the use of lazy loading in it.

Lazy loading is not something specific to C# or to Entity Framework. It's a common pattern, which allows defer some data loading. Deferring means not loading immediately. Some samples when you need that:
Loading images in (Word) document. Document may be big and it can contain thousands of images. If you'll load all them when document is opened it might take big amount of time. Nobody wants sit and watch 30 seconds on loading document. Same approach is used in web browsers - resources are not sent with body of page. Browser defers resources loading.
Loading graphs of objects. It may be objects from database, file system objects etc. Loading full graph might be equal to loading all database content into memory. How long it will take? Is it efficient? No. If you are building some file system explorer will you load info about every file in system before you start using it? It's much faster if you will load info about current directory only (and probably it's direct children).
Lazy loading not always mean deferring loading until you really need data. Loading might occur in background thread before you really need that data. E.g. you might never scroll to the bottom of web page to see footer image. Lazy loading means only deferring. And C# enumerators can help you with that. Consider getting list of files in directory:
string[] files = Directory.GetFiles("D:");
IEnumerable<string> filesEnumerator = Directory.EnumerateFiles("D:");
First approach returns array of files. It means directory should get all its files and save their names to array before you can get even first file name. It's like loading all images before you see document.
Second approach uses enumerator - it returns files one by one when you ask for next file name. It means that enumerator is returned immediately without getting all files and saving them to some collection. And you can process files one by one when you need that. Here getting files list is deferred.
But you should be careful. If underlying operation is not deferred, then returning enumerator gives you no benefits. E.g.
public IEnumerable<string> EnumerateFiles(string path)
{
foreach(string file in Directory.GetFiles(path))
yield return file;
}
Here you use GetFiles method which fills array of file names before returning them. So yielding files one by one gives you no speed benefits.
Btw in your case you have exactly same problem - GetCustomAttributes extension internally uses Attribute.GetCustomAttributes method which returns array of attributes. So you will not reduce time of getting first result.

This isn't quite how the term "lazy loading" is generally used in .NET. "Lazy loading" is most often used of something like:
public SomeType SomeValue
{
get
{
if (_backingField == null)
_backingField = RelativelyLengthyCalculationOrRetrieval();
return _backingField;
}
}
As opposed to just having _backingField set when an instance was constructed. Its advantage is that it costs nothing in the cases when SomeValue is never accessed, at the expense of a slightly greater cost when it is. It's therefore advantageous when the chances of SomeValue not being called are relatively high, and generally disadvantageous otherwise with some exceptions (when we might care about how quickly things are done in between instance creation and the first call to SomeValue).
Here we have deferred execution. It's similar, but not quite the same. When you call GetPropertyErrors(property) rather than receiving a collection of all of the errors you receive an object that can find those errors when asked for them.
It will always save the time taken to get the first such item, because it allows you to act upon it immediately rather than waiting until it has finished processing.
It will always reduce memory use, because it isn't spending memory on a collection.
It will also save time in total, because no time is spent creating a collection.
However, if you need to access it more than once, then while a collection will still have the same results, it will have to calculate them all again (unlike lazy loading which loads its results and stores them for subsequent reuse).
If you're rarely going to want to hit the same set of results, it's generally always a win.
If you're almost always going to want to hit the same set of results, it's generally a lose.
If you are sometimes going to want to hit the same set of results though, you can pass the decision on whether to cache or not up to the caller, with a single use calling GetPropertyErrors() and acting on the results directly, but a repeated use calling ToList() on that and then acting repeatedly on that list.
As such, the approach of not sending a list is the more flexible, allowing the calling code to decide which approach is the more efficient for its particular use of it.
You could also combine it with lazy loading:
private IEnumerable<string> LazyLoadedEnumerator()
{
if (_store == null)
return StoringCalculatingEnumerator();
return _store;
}
private IEnumerable<string> StoringCalculatingEnumerator()
{
List<string> store = new List<string>();
foreach(string str in SomethingThatCalculatesTheseStrings())
{
yield return str;
store.Add(str);
}
_store = store;
}
This combination is rarely useful in practice though.
As a rule, start with deferred evaluation as the normal approach and decide further up the call chain whether to store the results or not. An exception though is if you can know the size of the results before you begin (you can't here because you don't know if an element will be added or not until you've examined the property). In this case there is the possibility of a performance improvement in just how you create that list, because you can set its capacity ahead of time. This though is a micro-optimisation that is only applicable if you also know that you'll also always want to work on a list and doesn't save that much in the grand scheme of things.

Memory consumption when initializing object

I am trying to build some objects and insert them into a database. The number of records that have to be inserted is big ~ millions.
The insert is done in batches.
The problem I am having is that i need to initialize new objects to add them to a list and at the end, i do a bulk insert into the database of the list. Because i am initializing a huge number of objects, my computer memory(RAM) gets filled up and it kinda freezes everything.
The question is :
From a memory point of view, should I initialize objects of set them to null ?
Also, I am trying to work with the same object reference. Am i doing it right ?
Code:
QACompleted completed = new QACompleted();
QAUncompleted uncompleted = new QAUncompleted();
QAText replaced = new QAText();
foreach (QAText question in questions)
{
MatchCollection matchesQ = rgx.Matches(question.Question);
MatchCollection matchesA = rgx.Matches(question.Answer);
foreach (GetKeyValues_Result item in values)
{
hasNull = false;
replaced = new QAText(); <- this object
if (matchesQ.Count > 0)
{
SetQuestion(matchesQ, replaced, question, item);
}
else
{
replaced.Question = question.Question;
}
if (matchesA.Count > 0)
{
SetAnswer(matchesA,replaced,question,item);
}
else
{
replaced.Answer = question.Answer;
}
if (!hasNull)
{
if (matchesA.Count == 0 && matchesQ.Count == 0)
{
completed = new QACompleted(); <- this object
MapEmpty(replaced,completed, question.Id);
}
else
{
completed = new QACompleted(); <- this object
MapCompleted(replaced, completed, question.Id, item);
}
goodResults.Add(completed);
}
else
{
uncompleted = new QAUncompleted(); <- this object
MapUncompleted(replaced,uncompleted,item, question.Id);
badResults.Add(uncompleted);
}
}
var success = InsertIntoDataBase(goodResults, "QACompleted");
var success1 = InsertIntoDataBase(badResults, "QAUncompleted");
}
I have marked the objects. Should I just call them like replaced = NULL, or should i use the constructor ?
What would be the difference between new QAText() and = null ?

The memory cost of creating objects
Creating objects in C# will always have a memory cost. This relates to the memory layout of object. Assuming you are using 64 bit OS, the runtime has to allocate an extra 8 bytes for sync block, and 8 bytes for method table pointer. After the sync block and method table pointer are your customized data fields. Besides the inevitable 16 bytes header, objects are always aligned to the boundary of 8 bytes and therefore can incur extra overhead.
You can roughly estimate the memory overhead if you know exactly what is the number of objects you create. However I would suggest you be careful when assuming that your memory pressure is coming from object layout overhead. This is also the reason I suggest you estimate the overhead as the first step. You might end up realizing that even if the layout overhead can magically be completely removed, you are not going to make a huge difference in terms of memory performance. After all, for a million objects, the overhead of object header is only 16 MB.
The difference between replaced = new QAText() and replaced = null
I suppose after you set replaced to null you still have to create another QAText()? If so, memory-wise there is no real difference to the garbage collector. The old QAText instance will be collected either way if you are not making any other reference to it. When to collect the instance, however, is the call of garbage collector. Doing replaced = null will not make the GC happen earlier.
You can try to reuse the same QAText instance instead of creating a new one every time. But creating a new one every time will not result in high memory pressure. It will make the GC a little busier therefore result in a higher CPU usage.
Identify the real cause for high memory usage
If your application is really using a lot of memory, you have to look at the design of your QACompleted and QAUncompleted objects. Those are the objects added to the list and occupy memory until you submit them to the database. If those objects are designed well(they are only taking the memory they have to take), as Peter pointed out you should use a smaller batch size so you don't have to keep too many of them in memory.
There are other factors in your program that can possible cause unexpected memory usage. What is the data structure for goodResults and badResults? Are they List or LinkedList? List internally is nothing but a dynamic array. It uses a grow policy which will always double its size when it is full. The always-double policy can eat up memory quickly especially when you have a lot of entries.
LinkedList, on the other side, does not suffer from the above-mentioned problem. But every single node requires roughly 40 extra bytes.
It also worth-checking what MapCompleted and MapUnCompleted methods are doing. Are they making long-lived reference to replaced object? If so it will cause a memory leak.
As a summary, when dealing with memory problems, you should focus on macro-scope issues such as the choice of data structures, or memory leaks. Or optimize your algorithms so that you don't have to keep all the data in memory all the time.

Instantiating new (albeit empty) object always takes some memory, as it has to allocate space for the object's fields. If you aren't going to access or set any data in the instance, I see no point in creating it.

It's unfortunate that the code example is not written better. There seem to be lots of declarations left out, and undocumented side-effects in the code. This makes it very hard to offer specific advice.
That said…
Your replaced object does not appear to be retained beyond one iteration of the loop, so it's not part of the problem. The completed and uncompleted objects are added to lists, so they do add to your memory consumption. Likewise the goodResults and badResults lists themselves (where are the declarations for those?).
If you are using a computer with too little RAM, then yes...you'll run into performance issues as Windows uses the disk to make up for the lack of RAM. And even with enough RAM, at some point you could run into .NET's limitations with respect to object size (i.e. you can only put so many elements into a list). So one way or the other, you seem to need to reduce your peak memory usage.
You stated that when the data in the lists is inserted into the database, the lists are cleared. So presumably that means that there are so many elements in the values list (one of the undeclared, undocumented variables in your code example) that the lists and their objects get too large before getting to the end of the inner loop and inserting the data into the database.
In that case, then it seems likely the simplest way to address the issue is to submit the updates in batches inside the inner foreach loop. E.g. at the end of that loop, add something like this:
if (goodResults.Count >= 100000)
{
var success = InsertIntoDataBase(goodResults, "QACompleted");
}
if (badResults.Count >= 100000)
{
var success = InsertIntoDataBase(badResults, "QACompleted");
}
(Declaring the actual cut-off as a named constant of course, and handling the database insert result return value as appropriate).
Of course, you would still do the insert at the end of the outer loop too.

deallocating memory for objects I haven't set to null

EDIT: Problem wasn't related to the question. It was indeed something wrong with my code, and actually, it was so simple that I don't want to put it on the internet. Thanks anyway.
I read in roughly 550k Active directory records and store them in a List, the class being a simple wrapper for an AD user. I then split the list of ADRecords into four lists, each containing a quarter of the total. After I do this, I read in about 400k records from a database, known as EDR records, into a DataTable. I take the four quarters of my list and spawn four threads, passing each one of the four quarters. I have to match the AD records to the EDR records using email right now, but we plan to add more things to match on later.
I have a foreach on the list of AD records, and inside of that, I have to run a for loop on the EDR records to check each one, because if an AD record matches more than one EDR record, then that isn't a direct match, and should not be treated as a direct match.
My problem, by the time I get to this foreach on the list, my ADRecords list only has about 130 records in it, but right after I pull them all in, I Console.WriteLine the count, and it's 544k.
I am starting to think that even though I haven't set the list to null to be collected later, C# or Windows or something is actually taking my list away to make room for the EDR records because I haven't used the list in a while. The database that I have to use to read EDR records is a linked server, so it takes about 10 minutes to read them all in, so my list is actually idle for 10 minutes, but it's never set to null.
Any ideas?
//splitting list and passing in values to threads.
List<ADRecord> adRecords = GetAllADRecords();
for (int i = 0; i < adRecords.Count/4; i++)
{
firstQuarter.Add(adRecords[i]);
}
for (int i = adRecords.Count/4; i < adRecords.Count/2; i++)
{
secondQuarter.Add(adRecords[i]);
}
for (int i = adRecords.Count/2; i < (adRecords.Count/4)*3; i++)
{
thirdQuarter.Add(adRecords[i]);
}
for (int i = (adRecords.Count/4)*3; i < adRecords.Count; i++)
{
fourthQuarter.Add(adRecords[i]);
}
DataTable edrRecordsTable = GetAllEDRRecords();
DataRow[] edrRecords = edrRecordsTable.Select("Email_Address is not null and Email_Address <> ''", "Email_Address");
Dictionary<string, int> letterPlaces = FindLetterPlaces(edrRecords);
Thread one = new Thread(delegate() { ProcessMatches(firstQuarter, edrRecords, letterPlaces); });
Thread two = new Thread(delegate() { ProcessMatches(secondQuarter, edrRecords, letterPlaces); });
Thread three = new Thread(delegate() { ProcessMatches(thirdQuarter, edrRecords, letterPlaces); });
Thread four = new Thread(delegate() { ProcessMatches(fourthQuarter, edrRecords, letterPlaces); });
one.Start();
two.Start();
three.Start();
four.Start();
In ProcessMatches, there is a foreach on the List of ADRecords passed in. The first line in the foreach is AdRecordsProcessed++; which is a global static int, and the program finishes with it at 130 instead of the 544k.

The variable is never set to null and is still in scope? If so, it shouldn't be collected and idle time isn't your problem.
First issue I see is:
AdRecordsProcessed++;
Are you locking that global variable before updating it? If not, and depending on how fast the records are processed, it's going to be lower than you expect.
Try running it from a single thread (i.e. pass in adRecords instead of firstQuarter and don't start the other threads.) Does it work as expected with 1 thread?

Firstly, you don't set a list to null. What you might do is set every reference to a list to null (or to another list), or all such references might simply fall out of scope. This may seem like a nitpick point, but if you are having to examine what is happening to your data it's time to be nitpicky on such things.
Secondly, getting the GC to deallocate something that has a live reference is pretty hard to do. You can fake it with a WeakReference<> or think you've found it when you hit a bug in a finaliser (because the reference isn't actually live, and even then its a matter of the finaliser trying to deal with a finalised rather than deallocated object). Bugs can happen everywhere, but that you've found a way to make the GC deallocate something that is live is highly unlikely.
The GC will be likely do two things with your list:
It is quite likely to compact the memory used by it, which will move its component items around.
It is quite likely to promote it to a higher generation.
Neither of these are going to have any changes you will detect unless you actually look for them (obviously you'll notice a change in generation if you keep calling GetGeneration(), but aside from that you aren't really going to).
The memory used could also be paged out, but it will be paged back in when you go to use the objects. Again, no effect you will notice.
Finally, if the GC did deallocate something, you wouldn't have a reduced number of items, you'd have a crash, because if objects just got deallocated the system will still try to use the supposedly live references to them.
So, while the GC or the OS may do something to make room for your other object, it isn't something observable in code, and it does not stop the object from being available and in the same programmatic state.
Something else is the problem.

Is there a reason you have to get all the data all at once? If you break the data up into chunks it should be more manageable. All I know is having to get into GC stuff is a little smelly. Best to look at refactoring your code.

The garbage collector will not collect:
A global variable
Objects managed by static objects
A local variable
A variable referencable by any method on the call stack
So if you can reference it from your code, there is no possibility that the garbage collector collected it. No way, no how.
In order for the collector to collect it, all references to it must have gone away. And if you can see it, that's most definitely not the case.

Threaded code execution time rises slowly. How to determine the culprit?

I have some code in a thread. The code's main function is to call other methods, which write stuff to a SQL database like this:
private void doWriteToDb()
{
while (true)
{
try
{
if (q.Count == 0) qFlag.WaitOne();
PFDbItem dbItem = null;
lock (qLock)
{
dbItem = q.Dequeue();
}
if (dbItem != null)
{
System.Diagnostics.Stopwatch sw = System.Diagnostics.Stopwatch.StartNew();
//write it off
PFResult result = dbItem.Result;
double frequency = dbItem.Frequency;
int i = dbItem.InChannel;
int j = dbItem.OutChannel;
long detail, param, reading, res;
detail = PFCoreMethods.AddNewTestDetail(DbTables, _dicTestHeaders[result.Name.ToString().ToLower()]);
param = PFCoreMethods.AddNewTestParameter(DbTables, detail, "Frequency");
PFCoreMethods.AddNewTestParameterValue(DbTables, param, frequency.ToString());
param = PFCoreMethods.AddNewTestParameter(DbTables, detail, "In channel");
PFCoreMethods.AddNewTestParameterValue(DbTables, param, i.ToString());
param = PFCoreMethods.AddNewTestParameter(DbTables, detail, "Out channel");
PFCoreMethods.AddNewTestParameterValue(DbTables, param, j.ToString());
param = PFCoreMethods.AddNewTestParameter(DbTables, detail, "Spec");
PFCoreMethods.AddNewTestParameterValue(DbTables, param, result.Spec);
dbItem.Dispose();
dqcnt++;
sw.Stop();
}
}
catch (Exception ex)
{
}
}
}
The AddNewTestParameter method is using a 3rd party class which has the SQL code. Currently I have no access to its internals.
DbTables is a collection object, whose properties are table objectscreated by the 3rd party program. There is only one DbTable object which the program uses.
The problem is that as time passes (couple of hours) the AddNewTestParameter method call takes longer and longer, starting from about 10ms to about 1sec.
The q, is a queue with objects that contain the necessary information to write into the database. The items are added to this queue by the main thread. THe thread simply takes them out, writes them, and disposes of them. The q.Count is no more than 1, although in time as the database writes become slower, the q.Count rises since dequeueing cannot catch up. At its worst, the q.Count was over 30,000. I write over 150,000 entries to the database in total.
On the SQL end, I ran some traces on the server, and the trace shows that internally SQL always takes about 10ms, even during the time the C# code itself takes 1sec.
So, currently, I have 2 suspicions:
My code is the problem. The thread is low-priority, perhaps this might affect performance. Also, after watching the memory usage for 20 minutes, I see that it rises at about 100K/min, CPU usage seems constant around %2-5. How can I figure out where the memory leak happens? Can I pinpoint it to a specific part of the code?
The 3rd party code is the problem. How could I go about proving this? What methods are there to watch and confirm that the problem lies in 3rd party code?

Anyway, if I had to make a suggestion I would look at DBTables ... if that's a collection maybe you're forgetting to reset it so everytime you call it it has one more element... so after a while the 3rd party routine that's O(n^2), or something like that, starts to degrade because it's expecting a worst case scenario of 20 tables and you're providing 1000.
Edit: Ok, I would discard the problem being in the queue as dequeuing should be a really fast operation (you can measure it anyway). It still points to the DBTables collection growing bigger and bigger, have you check its size after the first x iterations?
Edit2: Ok, another approach, let's say the AddNewTestParameter does exactly what is says it does... ADD a new parameter that then gets added to an internal collection. Now, there are two options, if that's the case, either you're supposed to clear that collection by calling the "ClearParameters" function after each iteration and then it will be your fault, or you have not such functionality and then it's 3rd code fault. That would explain also your memory loses (altough that can also be related to the growing queue)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.