Sharp SNMP Async method leaking stack memory

Sharp SNMP Async method leaking stack memory - c#

I'm trying the new 9.0.0 RC1 release of SharpSNMP for its async methods. It's easy to use - practically a drop-in replacement for the old synchronous method.
My code to poll a list of OIDs asynchronously is:
// create a new get request message
var message = new GetRequestMessage(Messenger.NextRequestId, VersionCode.V2, SNMPReadCommunity, oids);
// get a new socket
using (Socket udpSocket = SNMPManager.GetSocket())
{
// wait for the response (this is async)
var res = await message.GetResponseAsync(SNMPManager, new UserRegistry(), udpSocket);
// check the variables we received
CheckSnmpResults(res.Pdu().Variables);
}
I limit the number of OIDs per get-request to 25. My application connects to c.50 SNMP devices. Every 5 minutes a timer ticks and runs the above code several times in a loop in order to poll c.100 OIDs on each device. All good.
The problem is that the message.GetResponseAsync method is leaking memory. Every poll run adds 6 or 7 MB to my application's memory usage. Using the VS2015 memory profiler, I can see a large number of OverlappedData objects, each 65K, the number of which increases every time I run message.GetResponseAsync. So running this to receive c.200 SNMP get-requests every 5 minutes means my application's memory use quickly rockets.
Am I using message.GetResponseAsync incorrectly somehow? Is this a bug in SharpSNMPLib?
Thanks,
Giles

A temp answer right now.
The leak is caused by the fact that SocketAsyncEventArgs is not reused. This kind of object should be reused (as well as the Socket object) if a manager tries to manage an agent with multiple operations.
The current design does not allow such reuse. Thus, an overall redesign is needed.
I already have some ideas on how to move on, but probably it won't be able to make into 9.0 release. See if 9.5 can be the first release with the new design. And then I will go back and update this answer.
Updated: this commit contains a quick fix to dispose the args object. But it does not enable reuse yet.

Related

Unexpected Parallel.ForEach loop behavior

Hi I am trying to mimic multi threading with Parallel.ForEach loop. Below is my function:
public void PollOnServiceStart()
{
constants = new ConstantsUtil();
constants.InitializeConfiguration();
HashSet<string> newFiles = new HashSet<string>();
//string serviceName = MetadataDbContext.GetServiceName();
var dequeuedItems = MetadataDbContext
.UpdateOdfsServiceEntriesForProcessingOnStart();
var handlers = Producer.GetParserHandlers(dequeuedItems);
while (handlers.Any())
{
Parallel.ForEach(handlers,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
handler =>
{
Logger.Info($"Started processing a file remaining in Parallel ForEach");
handler.Execute();
Logger.Info($"Enqueing one file for next process");
dequeuedItems = MetadataDbContext
.UpdateOdfsServiceEntriesForProcessingOnPollInterval(1);
handlers = Producer.GetParserHandlers(dequeuedItems);
});
int filesRemovedCount = Producer.RemoveTransferredFiles();
Logger.Info($"{filesRemovedCount} files removed from {Constants.OUTPUT_FOLDER}");
}
}
So to explain what's going on. The function UpdateOdfsServiceEntriesForProcessingOnStart() gets 4 file names (4 because of parallel count) and adds them to a thread safe object called ParserHandler. These objects are then put into a list var handlers.
My idea here is to loop through this handler list and call the handler.Execute().
Handler.Execute() copies files from the network location onto a local drive, parses through the file and creates multiple output files, then sends said files to a network location and updates a DB table.
What I expect in this Parallel For Each loop is that after the Handler.Execute() call, UpdateOdfsServiceEntriesForProcessingOnPollInterval(1) function will add a new file name from the db table it reads to the dequeued items container which will then be passed as one item to the recreated handler list. In this way, after one file is done executing, a new file will take its place for each parallel loop.
However what happens is that while I do get a new file added it doesn't get executed by the next available thread. Instead what happens is that the parallel for each has to finish executing the first 4 files and then it will pick up the very next file. Meaning, after the first 4 are ran in parallel, only 1 file is ran at a time thereby nullifying the whole point of the parallel looping. The initial files added before all 4 files finish the Execute() call are never executed.
IE:
(Start1, Start2, Start3, Start4) all at once. What should happen is something like (End2, Start5), and then (End3, Start6). But what is happening is (End 2, End 3, End 1, End 4), Then Start5. End5. Start6, End6.
Why is this happening?
Because we want to deploy multiple of instances of this service app in a machine, it is not beneficial to have a giant list waiting in queue. This is wasteful as the other app instances wont be able to process things.

I am writing what should be a long comment as an answer, although it's an awful answer because it doesn't answer the question.
Be aware that parallelizing filesystem operations is unlikely to make them faster, especially if the storage is a classic hard disk. The head of the disk cannot be in N places at the same moment, and if you tell it to do so will just waste most of its time traveling instead of reading or writing.
The best way to overcome the bottleneck imposed by accessing the filesystem is to make sure that there is work for the disk to do at all moments. Don't stop the disk's work to make a computation or to fetch/save data from/to the database. To make this happen you must have multiple workflows running concurrently. One workflow will do entirely I/O with the disk, another workflow will talk continuously with the database, a third workflow will utilize the CPU by doing the one calculation after the other etc. This approach is called task parallelism (doing heterogeneous work in parallel), as opposed with data parallelism (doing homogeneous work in parallel, the speciality of Parallel.ForEach). It is also called pipelining, because in order to make all workflows run concurrently you must place intermediate buffers between them, so you create a pipeline with the data flowing from buffer to buffer. Another term used for this kind of operations is producer-consumer pattern, which describes a short pipeline consisting by only two building blocks, with the first being the producer and the second the consumer.
The most powerful tool currently available¹ to create pipelines, is the TPL Dataflow library. It offers a variety of "blocks" (pipeline segments) that can be linked with each other, and can cover most scenarios. What you do is that you instantiate the blocks that will compose your pipeline, you configure them, you tell each one what work it should do, you link them together, you feed the first block with the initial raw data that should be processed, and finally await for the Completion of the last block. You can look at an example of using the TPL Dataflow library here.
¹ Available as built-in library in the .NET platform. Powerful third-party tools also exist, like the Akka.NET for example.

Windows service memory leak using TopShelf and Automapper c#

I'm running a windows service using TopShelf (based on console app in C# .NET 4.6.1) and I'm using Automapper 9.0.0. Every 10 seconds I run a task that processes about 1000 rows in a Ms SQL database (using entity framework), It seems like Automapper is taking up a lot of memory, and the memory grows each time the task is run (In task manager I can see the service taking up over 3000 Meg of RAM++).
I am new to Automapper and don't now if there is anything I need to code to release manually the memory. Somewhere I saw a huge amount of handlers and I was wondering if Automapper generates these handlers and how I can clean them up.
I tried putting a GC.Collect() at the end of each task but I don't seem to see a difference
Here is a code extract of my task:
private void _LiveDataTimer_Elapsed(object sender, ElapsedEventArgs e)
{
// setting up Ninject data injection
var kernel = new StandardKernel();
kernel.Load(Assembly.GetExecutingAssembly());
//var stationExtesions = kernel.Get<IStationExtensionRepository>();
//var ops = kernel.Get<IOPRepository>();
//var opExtensions = kernel.Get<IOPExtensionRepository>();
//var periods = kernel.Get<IPeriodRepository>();
//var periodExtensions = kernel.Get<IPeriodExtensionRepository>();
// create the LiveDataTasks object
//var liveDataTasks = new LiveDataTasks(stationExtesions, ops, opExtensions, periods, periodExtensions);
// sync the station live data
//liveDataTasks.SyncLiveStationData();
// force garbage collection to prevent memory leaks
//GC.Collect();
Console.WriteLine("LiveDataTimer: Total available memory before collection: {0:N0}", System.GC.GetTotalMemory(false));
System.GC.Collect();
Console.WriteLine("LiveDataTimer: Total available memory collection: {0:N0}", System.GC.GetTotalMemory(true));
}
MOFICATIONS: I added some console outputs at the end of the code displaying the TotalMemory used. I removed GC.Collect() because it doesn't change anything and commented out most of the code accessing database. Now I realize that kernel.Load(Assembly.GetExecutingAssembly()); already makes memory grow very fast. See the following console capture:
Now if I comment out kernel.Load(Assembly.GetExecutingAssembly()); I get a stable memory situation again. How can I Dispose or unload the Kernel???

Well, first of all you should not be doing Database work in a Service. Moving any big operation on DB data out of the DB will only add having to move the data over the Network twice - once to the client programm, once back to the DB - while also risking Race Conditions and a lot of other issues. My standing advice is: Keep DB work in the DB at all times.
As for the memory Footprint, this migth just be a missreading of the used Memory:
.NET uses the Garbage Collection Memory Management approach. One effect of it is that while the GC for any given Application does his collecting, all other threads have to be paused. As a result the GC is pretty lazy at running. If it only runs once on Application closure - that is the ideal case. So it tries avoding to run before that unessesarily. It will still run as much as it can, before it ever throws a OutOfMemoryException at you. But beyond that, it is perfectly happy to just keep allocating more and more object without cleaning up.
You can test if it is that by calling GC.Collect(). However such a call should generally never be in productive code. A alternate GC strategy (particular the one used for WebServers) might be better.

I finally figured out what was happening: the kernel.Load(...) used to set up NInject data injection was increasing my memory:
var kernel = new StandardKernel();
kernel.Load(Assembly.GetExecutingAssembly());
So I moved this code from the function executed every x seconds to the constructor of the parent class where it is only executed once on initialisation.
This solved the problem.
Thanks guys for your inspiring help and comments!!!!

Parallel programming for Windows Service

I have a Windows Service that has code similar to the following:
List<Buyer>() buyers = GetBuyers();
var results = new List<Result();
Parallel.Foreach(buyers, buyer =>
{
// do some prep work, log some data, etc.
// call out to an external service that can take up to 15 seconds each to return
results.Add(Bid(buyer));
}
// Parallel foreach must have completed by the time this code executes
foreach (var result in results)
{
// do some work
}
This is all fine and good and it works, but I think we're suffering from a scalability issue. We average 20-30 inbound connections per minute and each of those connections fire this code. The "buyers" collection for each of those inbound connections can have from 1-15 buyers in it. Occasionally our inbound connection count sees a spike to 100+ connections per minute and our server grinds to a halt.
CPU usage is only around 50% on each server (two load balanced 8 core servers) but the thread count continues to rise (spiking up to 350 threads on the process) and our response time for each inbound connection goes from 3-4 seconds to 1.5-2 minutes.
I suspect the above code is responsible for our scalability problems. Given this usage scenario (parallelism for I/O operations) on a Windows Service (no UI), is Parallel.ForEach the best approach? I don't have a lot of experience with async programming and am looking forward to using this opportunity to learn more about it, figured I'd start here to get some community advice to supplement what I've been able to find on Google.

Parallel.Foreach has a terrible design flaw. It is prone to consume all available thread-pool resources over time. The number of threads that it will spawn is literally unlimited. You can get up to 2 new ones per second driven by heuristics that nobody understands. The CoreCLR has a hill climbing algorithm built into it that just doesn't work.
call out to an external service
Probably, you should find out what's the right degree of parallelism calling that service. You need to find out by testing different amounts.
Then, you need to restrict Parallel.Foreach to only spawn as many threads as you want at a maximum. You can do that using a fixed concurrency TaskScheduler.
Or, you change this to use async IO and use SemaphoreSlim.WaitAsync. That way no threads are blocked. The pool exhaustion is solved by that and the overloading of the external service as well.

Minimizing WCF service client memory usage

I'm implementing a WCF service client which is aimed to test several service methods. That's done by using standard generated proxy class created by Add Web Reference (inherited from System.Web.Services.Protocols.SoapHttpClientProtocol). What I need to do is execute certain type of requests many times simultaneously to see how it will affect server performance (something like capacity testing for server infrastructure).
Here's the problem - each of responses to these requests is pretty large (~10-100 mb) and I see that only few calls like
// parametersList.Count = 5
foreach(var param in parametersList)
{
var serviceResponse = serviceWebReferenceProxy.ExecuteMethod(param);
// serviceResponse is not saved anywhere else,
// expected to be GC'd after iteration
}
causes Private bytes of process to jump to ~500 mb of memory and Working Set to 200-300 mb. I suspect running them in parallel and increasing iterations count to 100-200 as needed will definitely cause StackOverflow/OutOfMemoryException. How this can be done then? I'm expecting removal of assigning service method response to variable will help, but that's a problem because I need to see each response's size. I'm looking for some sort of instant and guaranteed memory cleanup after each iteration.

Refactored logic to reuse existing objects as much as possible, which gave an ability to run more clients. After certain period of time garbage collecting becomes very slow but performance is acceptable.

Does ZeroMQ Poll function use excessive CPU on .NET?

I have a simple console application that uses ZeroMQ to send and receive messages. In the receive portion, I have the following message pump code:
ZMQ.Context _context = new ZMQ.Context(1);
ZMQ.PollItem[] pollItems = new ZMQ.PollItem[0];
while (!_finished)
{
if (pollItems.Length > 0)
context.Poll(pollItems, pollTimeout);
else
Thread.Sleep(1);
if (_receiversChanged)
UpdatePollItems(ref pollItems);
}
(The idea is that I can add and remove items from the poller at run-time, as I need to add receivers. UpdatePollItems simply creates a new array whenever the set of receivers changes.)
I have tried pollTimeout values of 50ms and 500ms but the app (which is sitting on its main thread on Console.ReadKey) still uses 100% of one core, even when no messages are being sent. I ran the app under the profiler and confirmed that it is ZMQ.Context.Poller that is chewing all the CPU.
Have others seen similar behaviour? I am using the latest ZeroMQ C# binding (clrzmq-x64.2.2.3 from NuGet).

Yes there is a bug in the driver. I hit that as well. Looking at the code it is possible that the .net 4 version should fare better, but you have to recompile it. I will check whether the code I rewrote could be reintegrated as a pull request.

I'm going to guess that when you say you are setting the poll timeout to 500 ms, that you are setting the variable pollTimeout to 500. Which would be incorrect. For a timeout of 500ms, the variable pollTimeout should be set to 500000. If you do context.Poll(...,500) it is interpreted as 500 usec and it is internally rounded off to 0 ms.
I verified on my own system that passing 500 to poll will cause CPU utilization between 90 and 100%. Setting the value to anything over 1000 makes CPU usage much less and for 500000 (500ms) it should be negligible.
Either way, please update your code sample to the initialization of the variable pollTimeout. If I am completely off base, then at least it will prevent other would-be answerers from going down this path.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.