RabbitMQ: erl.exe taking high CPU usages - c#

I have implemented rabbitmq in my application and it's running on windows server 2008 server, the problem is that erl.exe taking high CPU usages like sometime it reaches 40-45% CPU usages, even in the ideal case (when not processing any queue) it takes at least 4-15% CPU usages.
What could be the reason for taking high CPU usages? Is there any setting or any other thing that I need to do.

You say that even when not processing a queue it is still at 4-15%, but is your application running? If you weren't before, try to monitor erl while no application is using Rabbit.
One thing that comes to mind is that you might be using the QueingBasicConsumer in a loop and that could be contributing to the CPU usage. If you are using QueingBasicConsumer and it is what is causing the hit, try substituting it with EventingBasicConsumer (such that you don't do busy waiting) and see if you have improvement.
Also, how is your application using Rabbit? According to the documentation every IConnection is backed up by a background thread and if you're creating a bunch of connections in your application it could be another reason for the slow down.

Related

CPU underutilized. Due to blocking I/O?

I am trying to find where lies the bottleneck of a C# server application which underutilize CPU. I think this may be due to poor disk I/O performance and has nothing to do with the application itself but I am having trouble making a fact out of this supposition.
The application reads messages from a local MSMQ queue, does some processing on each messages and after processing the messages, sends out response messages to another local MSMQ queue.
I am using an async loop to read messages from queue, dequeuing them as fast as possible and dispatching them for processing using Task.Run to launch the processing of each messages (and do not await on this Task.Run .. just attaching a continuation only faulted on it to log error). Each messages is processed concurrently, i.e there is no need to wait for a message to be fully processed before processing the next one.
At the end of the processing of a message, I am using the Send method of MessageQueue (somehow asynchronous but not really because it has to wait on disk write before returning -see System.Messaging - why MessageQueue does not offer an asynchronous version of Send).
For the benchmarks I am queuing 100K messages in the queue (approx 100MB total size for 100K messages) and then I launch the program. On two of my personal computers (SSD HD on one and SATA2 HD on the other with i7 CPU quadcores -8 logical proc-) I reach ~95% CPU usage for the duration of the program lifecyle (dequeuing the 100K messages, processing them and sending responses). Messages are dequeued as fast a possible, processed as fast as possible (CPU involved here) and then response for each message sent to different local queue.
Now on a virtual machine running non HT dual core CPU (have no idea what is the underlying disk but seems far less performant than mines... during benchmark, with Perfmon I can see avg disk sec/write arround 10-15 ms on this VM, whereas it is arround 2ms on my personal machines) when I am running the same bench, I only reach ~55% CPU (when I am running the same bench on the machine without sending response messages to queue I reach ~90% CPU).
I don't really understand what is the problem here. Seems clear that sending message to the queue is the problem and slows down the global processing of the program (and dequeuing of messages to be processed), but why would that be considering that I am using Task.Run to launch processing of each dequeued message and ultimately response sending, I would not expect CPU to be underutilized. Unless when one thread is sending a message it blocks other threads to run on the same core while it waits for the return (disk write) in which case it would maybe make sense considering latency is much higher than on my personal computers, but a thread waiting for I/O should not prevent other threads from running.
I am really trying to understand why I am not reaching at least 95% cpu usage on this machine. I am blindly saying this is due to poorer disk i/o performance, but still I don't see why it would lead to CPU underutilization considering I am running processing concurrently using Task.Run. It could also be some system problem completely unrelated to disk, but considering that MessageQueue.Send seems to be the problem and that this method ultimately writes messages to a memory mapped file + disk, I don't see where the performance issue could come from other than disk.
It is of course for sure a system performance issue as the program maximize CPU usage on my own computers, but I need to find what the bottleneck is exactly on the VM system, and why exactly it is affecting the concurrency / speed of my application.
Any idea ?
To examine poor disc and or cpu utilization there is only one tool: Windows Performance Toolkit. For an example how to use it see here.
You should get the latest one from the Windows 8.1 SDK (requires .NET 4.5.1) which gives you most capabilities but the one from the Windows 8 SDK is also fine.
There you get graphs % CPU Utilization and % Disc Utilization. If either one is at 100% and the other one is low then you have found the bottleneck. Since it is a system wide profiler you can check if the msmq service is using the disc badly or you or someone else (e.g. virus scanner is a common issue).
You can directly get to your call stacks and check which process and thread did wake your worker thread up which is supposed to run at full speed. Then you can jump to the readying thread and process and check what it did do before it could ready your thread. That way you can directly verify what was hindering it so long.
No more guessing. You can really see what the system is doing.
To analyze further enable in the CPU Usage Precise view the following columns:
NewProcess
NewThreadId
NewThreadStack(Frame Tags)
ReadyingProcess
ReadyingThreadId
Ready(us) Sum
Wait(us) Sum
Wait(us)
%CPU Usage
Then drill down for a call stack in your process to see where high Wait(us) times do occur in a thread that is supposed to run at full speed.. You can drill down to one single event until you can go no further. Then you will see values in Readying Process and ReadyingThreadId. Go to that process/thread (it can be your own) and repeat the process until you end up in some blocking operation which does either involve disc IO or sleeps or a long running device driver call (e.g virus scanner or the vm driver).
If the Disk I/O performance counters don't look abnormally high, I'd look next at the hypervisor level. Assuming you're running the exact same code, using a VM adds latency to the entire stack (CPU, RAM, Disk). You can perhaps tweak CPU Scheduling at the hypervisor level and see if this will increase CPU utilization.
I'd also consider using a RAMDisk temporarily for performance testing. This would eliminate the Disk/SAN latency and you can see if that fixes your problem.

Parallel Programing with Threads

Okay, I am bit confuse on what and how should I do. I know the theory of Parallel Programming and Threading, but here is my case:
We have number of log files in given folder. We read these log files in database. Usually reading these files take couple of hours to read, as we do it in serial method, i.e. we iterate through each file, then open a SQL transaction for each file and insert the log in database, then read another and do the same.
Now, I am thinking of using Parallel programming so I can consume all core of CPU, however I am still not clear if I use Thread for each file, will that make any difference to system? I mean if I create say 30 threads then will they run on single core or they run on Parallel ? How can I use both of them? if they are not already doing that?
EDIT: I am using Single Server, with 10K HDD Speed, and 4 Core CPU, with 4 GB RAM, no network operation, SQL Server is on same machine with Windows 2008 as OS. [can change OS if that help too :)].
EDIT 2: I run some test to be sure based on your feedbacks, here is what I found on my i3 Quad Core CPU with 4 GB RAM
CPU remains at 24-50% CPU1, CPU2 remain under 50% usage, CPU3 remain at 75% usage and CPU4 remains around 0%. Yes I have Visual studio, eamil client and lot of other application open, but this tell me that application is not using all core, as CPU4 remain 0%;
RAM remain constantly at 74% [it was around 50% before test], that is how we design the read. So, nothing to worry
HDD remain READ/Write or usage value remain less than 25% and even it spike to 25% in sine wave, as our SQL transaction first stored in memory and then it write to disk when memory is getting threshold, so again,
So all resources are under utilized here, and hence I think I can distribute work to make it efficient. Your thoughts again. Thanks.
First of all, you need to understand your code and why is it slow. If you're thinking something like “my code is slow and uses one CPU, so I'll just make it use all 4 CPUs and it will be 4 times faster”, then you're most likely wrong.
Using multiple threads makes sense if:
Your code (or at least a part of it) is CPU bound. That is, it's not slowed down by your disk, your network connection or your database server, it's slowed down by your CPU.
Or your code has multiple parts, each using a different resource. E.g. one part reads from a disk, another part converts the data, which requires lots of CPU and last part writes the data to a remote database. (Parallelizing this doesn't actually require multiple threads, but it's usually the simplest way to do it.)
From your description, it sounds like you could be in situation #2. A good solution for that is the producer consumer pattern: Stage 1 thread reads the data from the disk and puts it into a queue. Stage 2 thread takes the data from the queue, processes them and puts them into another queue. Stage 3 thread takes the processed data from the second queue and saves them to the database.
In .Net 4.0, you would use BlockingCollection<T> for the queue between the threads. And when I say “thread”, I pretty much mean Task. In .Net 4.5, you could use blocks from TPL Dataflow instead of the threads.
If you do it this way then you can get up to three times faster execution (if each stage takes the same time). If Stage 2 is the slowest part, then you can get another speedup by using more than one thread for that stage (since it's CPU bound). The same could also apply to Stage 3, depending on your network connection and your database.
There is no definite answer to this question and you'll have to test because as mentionned in my comments:
if the bottleneck is the disk I/O then you won't gain a lot by adding more threads and you might even worsen performance because more threads will be fighting to get access to the disk
if you think disk I/O is OK but CPU loads is the issue then you can add some threads, but no more than the number of cores because here again things will worsen due to context switching
if you can do more disk and network I/Os and CPU load is not high (very likely) then you can oversubscribe with (far) more threads than cores: typically if your threads are spending much of their time waiting for the database
So you should profile first, and then (or directly if you're in a hurry) test different configurations, but chances are you'll be in the third case. :)
First, you should check what is taking the time. If the CPU actually is the bottleneck, parallel processing will help. Maybe it's the network and a faster network connection will help. Maybe buying a faster disc will help.
Find the problem before thinking about a solution.
Your problem is not using all CPU, your action are mainly I/O (reading file , sending data to DB).
Using Thread/Parallel will make your code run faster since you are processing many files at the same time.
To answer your question , the framework/OS will optimize running your code over the different cores.
It varies from machine to machine but speaking generally if you have a dual core processor and you have 2 threads the Operating System will pass one thread to one core and the other thread to the other. It doesn't matter how many cores you use what matters is whether your equation is the fastest. If you want to make use of Parallel programming you need a way of sharing the workload in a way that logically makes sense. Also you need to consider where your bottleneck is actually occurring. Depending on the size of the file it may be simply the max speed of your read/write of the storage medium that is taking so long.As a test I suggest you log where the most time in your code is being consumed.
A simple way to test whether a non-serial approach will help you is to sort your files in some order divide the workload between 2 threads doing the same job simultaneously and see if it makes a difference. If a second thread doesn't help you then I guarantee 30 threads will only make it take longer due to the OS having to switch threads back and fourth.
Using the latest constructs in .Net 4 for parallel programming, threads are generally managed for you... take a read of getting started with parallel programming
(pretty much the same as what has happened more recently with async versions of functions to use if you want it async)
e.g.
for (int i = 2; i < 20; i++)
{
var result = SumRootN(i);
Console.WriteLine("root {0} : {1} ", i, result);
}
becomes
Parallel.For(2, 20, (i) =>
{
var result = SumRootN(i);
Console.WriteLine("root {0} : {1} ", i, result);
});
EDIT: That said, it would be productive / faster to perhaps also put intensive tasks into seperate threads... but to manually make your application 'Multi-Core' and have things like certain threads running on particular cores, that isn't currently possible, that's all managed under the hood...
have a look at plinq for example
and .Net Parallel Extensions
and look into
System.Diagnostics.Process.GetCurrentProcess().ProcessorAffinity = 4
Edit2:
Parallel processing can be done inside a single core with multiple threads.
Multi-Core processing means distributing those threads to make use of the multiple cores in a CPU.

Throttle CPU Usage of Application

A new game server just came out which our company would like to offer for rental. However, the game developers did not create any sort of hibernation mode to shut down the physics when no players are connected, so an empty server is eating 30% or so CPU.
I found this game panel addon which limits the CPU usage of Applications.
I have written a few small apps in C# .NET for our company to help improve our services and I am wondering how I would go about creating something like this. Is it possible?
You might consider simply lowering the priority of the process down. This won't limit CPU directly, but will cause the processes threads to be scheduled less often than processes with normal and higher priorities.
Check System.Diagnostics.Process.PriorityClass (doc)
My guess is that the server app is doing polling instead being event driven. Polling will use CPU unless this piece of code is converted to be event driven. The application will sleep until it receives an event from the OS that it needs to process. Polling will just spin looking for an event and wastes the CPU. Reducing the priority of the process will not really help unless with CPU usage reduction in any way. This app needs to be rewritten to be more CPU efficient.
This answer might be interesting for you and that's how I would do it.
How to restrict the CPU usage a C# program takes?
I don't know if you can do that, but you can change the thread priority of the executing thread via the Priority property. You would set that by:
Thread.CurrentThread.Priority = ThreadPriority.Lowest;
Also, I don't think you really want to cap it. If the machine is otherwise idle, you'd like it to get busy on with the task, right? ThreadPriority helps communicate this to the scheduler.
I'm assuming the game server is threaded. If this is the case, you may be able to pragmatically force CPU affinity on the application. If you had a way to tell if the game had users or not, ie if UDP packets are coming in on the assigned port, you could say "hey, no one is connected". You could then have your program force all working threads onto the same core.
So, if you had an 8 core cpu and all the threads were on one core, then at most it would use 12.5% cpu.
Once you see packets coming in on the assigned port, you could assign the affinity back to all cores.
You could take this a step further and say "Are there any "idle" games. If there are any idle games, which are all on.. lets say.. core 7, then run an infinite loop of the HLT instruction at a higher priority than the game, but force the thread to sleep so it doesn't completely starve the game.
This would cause the CPU to use less power, but would be a lot more work and have a higher chance of problems.
I would stick to forcing affinity only, and just let all the idle games share some given core.

C# - Moving files - to queue or multi-thread

I have an app that moves a project and its files from preview to production using a Flex front-end and a .NET web service. Currently, the process takes about 5-10 mins/per project. Aside from latency concerns, it really shouldn't take that long. I'm wondering whether or not this is a good use-case for multi-threading. Also, considering the user may want to push multiple projects or one right after another, is there a way to queue the jobs.
Any suggestions and examples are greatly appreciated.
Thanks!
Something that does heavy disk IO typically isn't a good candidate for multithreading since the disks can really only do one thing at a time. However, if you're pushing to multiple servers or the servers have particularly good disk subsystems some light threading may be beneficial.
As a note - regardless of whether or not you decide to queue the jobs, you will use multi-threading. Queueing is just one way of handling what is ultimately solved using multi-threading.
And yes, I'd recommend you build a queue to push out each project.
You should compare the speed of your code compared to just copying in Windows (i.e., explorer or command line) vs copying with something advanced like TeraCopy. If your code is significantly slower than Window then look at parts in your code to optimize using a profiler. If your code is about as fast as Windows but slower than TeraCopy, then multithreading could help.
Multithreading is not generally helpful when the operation I/O bound, but copying files involves reading from the disk AND writing over the network. This is two I/O operations, so if you separate them onto different threads, it could increase performance. For something like this you need a producer/consumer setup where you have a Circular queue with one thread reading from disk and writing to the queue, and another thread reading from the queue and writing to the network. It'll be important to keep in mind that the two threads will not run at the same speed, so if the queue gets full, wait before writing more data and if it's empty, wait before writing. Also the locking strategy could have a big impact on performance here and could cause the performance to degrade to slower than a single-threaded implementation.
If you're moving things between just two computers, the network is going to be the bottleneck, so you may want to queue these operations.
Likewise, on the same machine, the I/O is going to be the bottleneck, so you'd want to queue there, too.
You should try using the ThreadPool.
ThreadPool.QueueUserWorkItem(MoveProject, project);
Agreed with everyone over the limited performance of running the tasks in parallel.
If you have full control over your deployment environment, you could use Rhino Queues:
http://ayende.com/Blog/archive/2008/08/01/Rhino-Queues.aspx
This will allow you to produce a queue of jobs asynchronously (say from a WCF service being called from your Silverlight/Flex app) and consume them synchronously.
Alternatively you could use WCF and MSMQ, but the learning curve is greater.
When dealing with multiple files using multiple threads usually IS a good idea in concerns of performance.The main reason is that most disks nowadays support native command queuing.
I wrote an article recently about reading/writing files with multiple files on ddj.com.
See http://www.ddj.com/go-parallel/article/showArticle.jhtml?articleID=220300055.
Also see related question
Will using multiple threads with a RandomAccessFile help performance?
In particular i made the experience that when dealing with very many files it IS a good idea to use a number of threads. In contrary using many thread in many cases does not slow down applications as much as commonly expected.
Having said that i'd say there is no other way to find out than trying all possible different approaches. It depends on very many conditions: Hardware, OS, Drivers etc.
The very first thing you should do is point any kind of profiling tool towards your software. If you can't do that (like, if you haven't got such a tool), insert logging code.
The very first thing you need to do is figure out what is taking a long time to complete, and then why is it taking a long time to complete. That your "copy" operation as a whole takes a long time to complete isn't good enough, you need to pinpoint the reason for this down to a method or a set of methods.
Until you do that, all the other things you can do to your code will likely be guesswork. My experience has taught me that when it comes to performance, 9 out of 10 reasons for things running slow comes as surprises to the guy(s) that wrote the code.
So measure first, then change.
For instance, you might discover that you're in fact reporting progress of copying the file on a byte-per-byte basis, to a GUI, using a synchronous call to the UI, in which case it wouldn't matter how fast the actual copying can run, you'll still be bound by message handling speed.
But that's just conjecture until you know, so measure first, then change.

C# service with Oracle .NET provider gets slower and slower

I have a C# service written for .NET 2.0 that uses the Oracle data access provider for .NET 2.102.2.20. The service runs multiple threads and runs lots of queries to an Oracle 9.2 database. I am using NHibernate.
What I am seeing is that when it starts up it runs fast, then gets slower and slower. The CPU usage starts low then goes up and up. After a few minutes it is crawling and the CPU is at 100%. I have looked in my code and find nothing that could be doing this. Percent time in GC is < 5%. I have tried varying the ODP.NET parameters to no avail.
Anyone have an idea what could be doing this?
More detail:
The threads are worker threads and need to keep running all the time. I have no runaway threads, they are doing real work. I have profiled the program and it looks like it is spending a lot of its time inside the Oracle provider, which you would expect, but why would it use so much CPU? It's as if it's spinning waiting for resultsets or something, but it doesn't happen right away, only after a while.
Update:
It may be something to do with the .NET CLR on the server.
A multi-threaded test program that does a lot of string manipulation also shows the same behavior on this machine, starting out fast, then slowing down over the course of 15 minutes to about 1/3 of the speed. The test program does not show this slowdown behavior on a another identical server with the same OS version and the same (we think) .NET CLR version.
Update:
It is now looking like this is a problem with the server overheating and thermal protection kicking in and slowing down the CPUs' frequency.
Update:
A firmware update for the server fixed this. I still think it was an overheating problem because if I stopped running the process and let the server "rest" for a little while it would start running fast when I restarted the process, but if I killed the process and restarted it right away the process would start already running slow. My guess is that the firmware control for the fans was malfunctioning so the CPUs would heat up and slow down. If I stopped running for a little while they would cool down and run fast again, until they heated up again.
Sounds like your threads aren't terminated and keep running all the time or you've some serious memory leaks. Without more information, this could be anything.
If CPU is at 100% you're probably facing a runaway thread. You can diagnose this using WinDbg + SoS. Attach to the process and use the !runaway command to get an overview of how much CPU each thread is using. Then use !clrstack to find out what the runaway thread is doing. Let me know if you need additional details.
I'm getting the exact same issue. The OracleConnection gets progressively slower and slower. What's interesting is that if I call:
cn.Close();
OracleConnection.ClearPool(cn);
every time, it never slows down.
It must have something to do with the oracle connection (caching??)

Categories