I am having some trouble with while loops in C#. This happens let's say once per week.
I am looping through a SQL Server data reader and I read simple strings, doubles etc ... I don't understand why this happens once in a while. If there were a problem in my code then its performance should be ALWAYS poor.
To be more specific, when this happens, when I set a breakpoint, the thread breaks after 3 or 4 seconds. And the weirdest thing is that if I press F5 or if I "step-by-step" the loop, it takes less time !
That's really weird and it makes me crazy.
I am wondering if someone ever encountered this type of weird deal.
Thanks in advance for your replies !
PS : The while loop is executed by a particular thread (not the main one).
PS 2 : here is my code
reader = DBConnect.GetInstance.ExecuteReader(request.ToString(), out connection, timeOut);
while (reader.Read())
{
SourceInst sourceInst = new SourceInst();
sourceInst.Load(reader);
sourceInstList.Add(sourceInst);
}
Your database performance is likely varying, thus causing your issues. The SqlDataReader provides a stream to your database and does not buffer in memory. Your assumption that the "query is executed before looping" is incorrect. The stream remains open and you're reading one record at a time from the data source via cursor with this loop. Therefore, as your DB performance fluctuates, so does your loop's performance. It does not load and buffer all records into memory prior to the loop - you can do that if you like but I don't recommend it in terms of memory usage.
Related
I make one method who doing some simple operations like +, -, *, /.
I need to run this method 1513 times.
Here I try to run this method only once. To see do is working good and how times is be needed for to finish with operations.
Stopwatch st = new Stopwatch();
st.Start();
DiagramValue dv = new DiagramValue();
double pixel = dv.CalculateYPixel(23.46, diction);
st.Stop();
When is stop the stopwatch is teling me the time is 0.06s.
When I run the same method 1513 times in for loop like that:
Stopwatch st = new Stopwatch();
st.Start();
for (int i = 0; i < 1513; i++)
{
DiagramValue dv = new DiagramValue();
double pixel = dv.CalculateYPixel(23.46, diction);
}
st.Stop();
Then the Stopwatch is tell me is working around 0.14s. Or 0.14s / 1513 times = 0.00009s for one time.
My question is why If I running some method only once is too slow and if I running around thousand times in for loop is almost the same time.
Writing benchmarks is hard.
First, Stopwatch isn't infinitely accurate. When you run the method just once, you're very much limited by the accuracy of the underlying stopwatch. On the other hand, running the method multiple times alleviates this - you can get arbitrary precision by using a big enough loop. Instead of 1 vs 1513, compare e.g. 1500 vs. 3000. You'll get around 100% time increase, as expected.
Second, there's usually some cost with the first call in particular (e.g. JIT compilation) or with the memory pressure at the time of the call. That's why you usually need to do "preheating" - run the method outside of the stopwatch first to isolate these, and measure (multiple invocations) later.
Third, in a garbage collected environment like .NET, the guy who ordered the beer isn't necessarily the guy who pays the bill. Most of the cost of memory allocation in .NET is in the collection, rather than the allocation itself (which is about as cheap as a stack allocation). The collection usually happens outside of the code that caused the allocations in the first place, pointing you in the entirely wrong direction when searching for performance issues. That's why most .NET memory trackers display garbage collection separately - it's important to take account of, but can easily mislead you as to the cause if you're not careful.
There's many more issues, but these should cover your particular scenario well enough.
Some possible reasons include:
Timing resolution. You get a more accurate figure when you find the mean over a large number of iterations.
Noise. The percentage of stuff that isn't what you actually want to record, will be different.
Jitting. .NET will create code the first time a method is used. As such the first time it is run in a programs lifetime, the longer it will take, by a large factor (try running it once and then measuring the second attempt).
Branch prediction. If you keep doing the same thing with the same data the CPU's branch predictor is going to get better at predicting which branches are takken.
GC stability. Not likely in this case, but possible. Often at the start of a set of operations that requires particular objects to be created and then released the program ends up having to get more memory from the OS. When it's a bit into that set of operations it's more likely to have reached a steady state where it can just get that memory by cleaning out objects it isn't using any more, which is faster.
I have a console app that reads in a large text file with 40k+ lines, each line is a key that I use in a search for which the results are written to a output file. Issue is I leave this console app running for a while until it just suddenly closes and I realize that the process memory usage was really high was sitting at 1.6gb when I last saw it crash.
I looked around and didn't find many answers I did try to use the gcAllowVeryLargeObjects but that seems like I'm just dodging the problem.
Below is a snippet from my main() of where I write out to the file. I can't seem to understand why the memory usage gets so high. I flush the writer after every write (could it be because I'm keeping the file open for such a long period of time?).
TextWriter writer = new StreamWriter("output.csv", false));
foreach (var item in list)
{
Console.WriteLine("{0}/{1}", count, numofitem);
var result = TableServiceContext.Read(p.id);
if (result != null)
{
writer.WriteLine(String.Join(",", result.id,
result.code,
result.hash));
}
count++;
writer.Flush();
}
writer.Close();
Edit: I have 32gb of ram on my computer so I am sure it's not running out of memory because I don't have enough ram.
Edit2: changed the name of the repository as that was misleading.
If the average line length is 1KB then 40K lines is 40MB, and it nothing. That's why, I'm pretty sure problem is in your repository class. If it is EF repository, try to recreate DbContext for each line.
If you want to tune up your program, then, you can use the following method: Try to put timestamps to Console output, you can use Stopwatch class, and try to recreate your repository each 10 or 100 or N lines. Then, looking at timestamps, you can find optimal N to use.
var timer = Stopwatch.StartNew();
...
Console.WriteLine(timer.ElapsedMilliseconds);
From looking at the code I think the problem isN't the Streamwriter but some memory leak in your repository. Suggestions to check:
replace the repository by some dummy e.g. class dummy_repository with just the three properties id, value, hash.
likewise create a long "list" e.g. 40k small entries.
run your program and see if it still consumes memory (I am pretty sure it will not)
then step by step add back your original parts. See what step causes the memory leak.
Update: The answers from Andrew and Conrad were both equally helpful. The easy fix for the timing issue fixed the problem, and caching the bigger object references instead of re-building them every time removed the source of the problem. Thanks for the input, guys.
I'm working with a c# .NET API and for some reason the following code executes what I feel is /extremely/ slowly.
This is the handler for a System.Timers.Timer that triggers its elapsed event every 5 seconds.
private static void TimerGo(object source, System.Timers.ElapsedEventArgs e)
{
tagList = reader.GetData(); // This is a collection of 10 objects.
storeData(tagList); // This calls the 'storeData' method below
}
And the storeData method:
private static void storeData(List<obj> tagList)
{
TimeSpan t = (DateTime.UtcNow - new DateTime(1970, 1, 1));
long timestamp = (long)t.TotalSeconds;
foreach (type object in tagList)
{
string file = #"path\to\file" + object.name + ".rrd";
RRD dbase = RRD.load(file);
// Update rrd with current time timestamp and data.
dbase.update(timestamp, new object[1] { tag.data });
}
}
Am I missing some glaring resource sink? The RRD stuff you see is from the NHawk C# wrapper for rrdtool; in this case I update 10 different files with it, but I see no reason why it should take so long.
When I say 'so long', I mean the timer was triggering a second time before the first update was done, so eventually "update 2" would happen before "update 1", which breaks things because "update 1" has a timestamp that's earlier than "update 2".
I increased the timer length to 10 seconds, and it ran for longer, but still eventually out-raced itself and tried to update a file with an earlier timestamp. What can I do differently to make this more efficient, because obviously I'm doing something drastically wrong...
Doesn't really answer your perf question but if you want to fix the rentrancy bit set your timer.AutoRest to false and then call start() at the end of the method e.g.
private static void TimerGo(object source, System.Timers.ElapsedEventArgs e)
{
tagList = reader.GetData(); // This is a collection of 10 objects.
storeData(tagList); // This calls the 'storeData' method below
timer.Start();
}
Is there a different RRD file for each tag in your tagList? In your pseudo code you open each file N number of times. (You stated there is only 10 objects in the list thought.) Then you perform an update. I can only assume that you dispose your RRD file after you have updated it. If you do not you are keeping references to an open file.
If the RRD is the same but you are just putting different types of plot data into a single file then you only need to keep it open for as long as you want exclusive write access to it.
Without profiling the code you have a few options (I recommend profiling btw)
Keep the RRD files open
Cache the opened files to prevent you from having to open, write close every 5 seconds for each file. Just cache the 10 opened file references and write to them every 5 seconds.
Separate the data collection from data writing
It appears you are taking metric samples from some object every 5 seconds. If you do not having something 'tailing' your file, separate the collection from the writing. Take your data sample and throw it into a queue to be processed. The processor will dequeue each tagList and write it as fast as it can, going back for more lists from the queue.
This way you can always be sure you are getting ~5 second samples even if the writing mechanism is slowed down.
Use a profiler. JetBrains is my personal recommendation. Run the profiler with your program and look for the threads / methods taking the longest time to run. This sounds very much like an IO or data issue, but that's not immediately obvious from your example code.
I just got a code handed over to me. The code is written in C# and it inserts realtime data into database every second. The data is accumulated in time which makes the numbers big.
The data is updated within the second many times then at the end of the second result is taken and inserted.
We used to address the dataset rows directly within the second through the properties. For example many operations like this one 'datavaluerow.meanvalue += mean; could take place.
we figured out that this is degrading the performance after running the profiler becuase of the internal casting done so we created 2d array of decimals on which the updates are carried out then the values are assigned to the datarows only at the end of the second.
I ran a profiler and found out that it is still taking a lot of time (although less than the time spent accessing datarows frequently when added up).
The code that is exectued at the end of the second is as follows
public void UpdateDataRows(int tick)
{
//ord
//_table1Values is of type decimal[][]
for (int i = 0; i < _table1Values.Length; i++)
{
_table1Values[i][(int)table1Enum.barDateTime] = tick;
table1Row[i].ItemArray = _table1Values[i].Cast<object>().ToArray();
}
// this process is done for other 10 tables
}
Is there a way to further improve this approach.
One obvious question: why do you have a 2D array of decimals when you're only updating them with integers? Could you get away with an int[][] instead?
Next, why are you accessing (int)table1Enum.barDateTime on each iteration? Given that there's a conversion involved there, you may find it helps if you extract that out of the loop.
However, I suspect the majority of the time is going to be spent in _table1Values[i].Cast<object>().ToArray(). Do you really need to do that? Taking a copy of the decimal[] (or int[]) would be faster than boxing every value on every iteration on every call - and then creating another array.
I have the need to continuously build large strings in a loop and save them to database which currently occasionally yields an OutOfMemoryException.
What is basically going on here is I create a string using XmlWriter with StringBuilder based on some data. Then I call a method from an external library that converts this xml string to some other string. After that the converted string is saved to the database. This whole thing is done repeatedly in a loop about a 100 times for different data.
The strings by itself are not too big (below 500kByte each) and the process memory is not increasing during this loop. But still, occasionally I get a OutOfMemeoryExcpetion within StringBuilder.Append. Interestingly this exception does not result in a crash. I can catch that exception and continue the loop.
What is going on here? Why would I get an OutOfMemoryException although there is still enough free memory available in the system? Is this some GC heap problem?
Given that I can't circumvent converting all these strings, what could I do to make this work reliably? Should I force a GC collection? Should put a Thread.Sleep into the loop? Should I stop using StringBuilder? Should simply retry when confronted with a OutOfMemoryException?
There is memory but no contiguous segment that can handle the size of your string builder. You have to know that each time the buffer of the string builder is too short, its size is doubled. If you can define (in the ctor) the size of your builder, it's better.
You MAY call GC.Collect() when you are done with a large collection of objects.
Actually, when you have an OutOfMemory, it generaly shows a bad design, you may use the hard drive (temp files) instead of memory, you shouldn't allocate memory again and again (try to reuse objects/buffers/...).
I STRONGLY advice you to read this post “Out Of Memory” Does Not Refer to Physical Memory from Eric Lippert.
Try to reuse StringBuilder object when you do data generation.
After or before use just reset the size of the StringBuilder to 0 and start appending. This will decrease number of allocations and possibly make OutOfMemory situation very rare.
To illustrate my point:
void MainProgram()
{
StringBuilder builder = new StringBuilder(2 * 1024); //2 Kb
PerformOperation(builder);
PerformOperation(builder);
PerformOperation(builder);
PerformOperation(builder);
}
void PerformOperation(StringBuilder builder)
{
builder.Length = 0;
//
// do the work here builder.Append(...);
//
}
With the sizes you mention you are probably running into Large Object Heap (LOH) fragmentation.
Reusing StringBuilder objects is not a direct solution, you need to get a grip on the underlying buffers.
If possible, calculate or estimate the size beforehand and pre-allocate.
And it could help if you round up allocations, let's say to multiples of 20k or so. That could improve reuse.