Infinite looping consumes 100% CPU - c#

I am stuck in a situation where I need to generate a defined frequency of some Hz. I have tried multimedia timers and all other stuff available on the internet but so far an infinite loop with some if-else conditions gave me the best results. But the problem in this approach is that it consumes almost all of the cpu leaving no space for other applications to work properly.
I need an algorithm with either generates frequency of some Hz to KHz.
I am using windows plateform with C#.

You can't accurately generate a signal of a fixed frequency on a non-realtime platform. At any moment, a high priority thread from same or other process could block the execution of your thread. E.g. when GC thread kicks in, worker threads will be suspended.
That being said, what is the jitter you are allowed to have and what is the highest frequency you need to support?

The way I would approach the problem would be to generate a "sound wave" and output in on the sound card.
There are ways to access your sound card from C#, e.g. using XNA.
As others have pointed out, using the CPU for that isn't a good approach.

Use Thread.Sleep(delay); in your loop
it's reduce processor usage

As you are generating a timer, You would want to Sleep based on the period of your frequency.
You will have to run your frequency generator in its own thread and call
Thread.Sleep(yourFrequencyPeriodMs);
in each iteration of period.

Related

Dealing with extremely small increments of time

OK, that title was perhaps vague, but allow me to explain.
I'm dealing with a large list, of hundreds of messages to be sent to a CAN bus as byte arrays. Each of these messages has an Interval property detailing how often the message must be sent, in milliseconds. But I'll get back to that.
So I have a thread. The thread loops through this giant list of messages until stopped, with the body roughly like this:
Stopwatch timer = new Stopwatch();
sw.Start();
while(!ShouldStop)
{
foreach(Message msg in list)
{
if(msg.IsReadyToSend(timer)) msg.Send();
}
}
This works great, with phenomenal accuracy in honoring the Message objects' Interval. However, it hogs an entire CPU. The problem is that, because of the massive number of messages and the nature of the CAN bus, there is generally less than half a millisecond before the thread has to send another message. There would never be a case the thread would be able to sleep for, say, more than 15 milliseconds.
What I'm trying to figure out is if there is a way to do this that allows for the thread to block or yield momentarily, allowing the processor to sleep and save some cycles. Would I get any kind of accuracy at all if I try splitting the work into a thread per message? Is there any other way of doing this that I'm not seeing?
EDIT: It may be worth mentioning that the Message's Interval property is not absolute. As long as the thread continues to spew messages, the receiver should be happy, but if the thread regularly sleeps for, say, 25 ms because of higher priority threads stealing its time-slice, it could raise red flags for the receiver.
Based on the updated requirement there is very good chance that default setup with Sleep(0) could be enough - messages may be sent in small bursts, but it sounds like is ok. Using multimedia timer may make burst less noticeable. Building more tolerance to receiver of the messages may be better approach (if possible).
If you need hard milliseconds accuracy with good guarantees - C# on Windows is not the best choice - separate hardware (even Adruino) may be needed, or at least lower level code that C#.
Windows is not RT OS, so you can't really get sub-millisecond accuracy.
Busy loop (possibly on high-pri thread) as you have is common approach if you need sub-millisecond accuracy.
You can try using Multimedia timers (sample - Multimedia timer interrupts in C# (first two interrupts are bad)), as well to change default time slice to 1ms (see Why are .NET timers limited to 15 ms resolution? for sample/explanation).
In any case you should be aware that your code can loose its time-slice if there are other higher priority threads to be scheduled and all your efforts would be lost.
Note: you obviously should consider if more sensible data structure is more suitable (i.e. heap or priority queue may work better to find next item).
As you have discovered, the most accurate way to "wait" on a CPU is to poll the RTC. However that is computationally intensive. If you are needing to get to the clock accuracy in timing, there is no other way.
However, in your original post, you said that the timing was in the order of 15ms.
On my 3.3GHz Quad Core i5 at home, 15ms x 3.3GHz = 50 Million Clock cycles (or 200 million if you count all the cores).
That is an eternity.
Loose sleep timing is most likely more than accurate enough for your purposes.
To be frank if you needed Hard RT, C# on the .net VM running on the .net GC on the Windows Kernel is the wrong choice.

C# - using Thread.Sleep() to get my cycle run several hundred times per second

I am developing an application which analyses real-time financial data. Currently my main computational cycle has the following design:
long cycle_counter=0;
while (process_data)
{
(analyse data, issue instruction - 5000 lines of straightforwasrd code with computations)
cycle_counter++;
Thread.Sleep(5);
}
When I run this application on my notebook (one Core i5) processor, the cycle runs 200-205 times per second - a sort of as expected (if you don't bother about why it runs more than 200 times a second).
But when I deploy the application on "real" workstation, which has 2 6-core Xeon processors and 24 GB of fast RAM, and which loads Win7 in about 3 seconds, the application runs the cycle about 67 times per second.
My questions are:
why is this happening?
how can I influence the number of runs per second in this situation?
are there any better solutions for running the cycle 200-1000 times per second? I am now thinking about just removing Thread.Sleep() (the way I use it here is criticised a lot). With 12 cores I have no problems using one core just for this cycle. But there my be some downside to such solution?
Thank you for your ideas.
The approach you're taking is simply fundamentally broken. Polling strategies are in general a bad way to go, and any time you do a Sleep for a reason other than "I want to give the rest of my timeslice back to the operating system", you're probably doing something wrong.
A better way to approach the problem is:
Make a threadsafe queue of unprocessed work
Make one thread that puts new work in the queue
Make n threads that take work out of the queue and do the work. n should be the number of CPUs you have minus one. If you have more than n threads then at least two threads are trading off CPU time, which is making them both slower!
The worker threads do nothing but sit in a loop taking work out of the queue and doing the work.
If the queue is empty then the "take work out" blocks.
When new work arrives, one of the blocked threads is reactivated.
How to build a queue with these properties is a famous problem called The Producer/Consumer Problem. There are lots of articles on how to do it any many implementations of blocking producer-consumer queues. I recommend finding an existing debugged one rather than trying to write your own; getting it right can be tricky.
Windows is not a RTOS (Real Time Operating System), so you cannot precisely determine when your thread will resume. Thread.Sleep(5) really means "wake me up no sooner then 5ms". The actual sleep time is determined by the specific hardware and mostly by the system load. You can try to workaround the system load issue by running your application on a higher priority.
BTW, System.Threading.Timer is a better approach (above comments still apply though).
The resolution of Sleep is dictated by the current timer tick interval and is usually either 10 or 15 milliseconds depending on the edition of Windows. This can be changed, however, by issuing a timeBeginPeriod command. See this answer.
Check your timer's actual frequency: many hardware timers have actual resolution
65536 ticks per hour = 65536 / 3600 = 18.204 ticks per second
So called "18.2" constant, that's why the actual timer's resolution is 1/18.2 = 55 ms; in the case of Sleep(5) it means that is could be either Sleep(0) or Sleep(55) depending on round up.
Not sure it is the best approach but another approach.
Try BlockingCollection and all you do in the producer is add and sleep.
The consumer then has the option to work full time if needed.
This still does not explain why the higher powered PC ran less cycles.
Is it OK for you to run your loop 200 times per second on average?
var delay = TimeSpan.FromMillseconds(5);
while (process_data) {
Console.WriteLine("do work");
var now = DateTime.Now;
if (now < nextDue)
System.Threading.Thread.Sleep(nextDue - now);
nextDue = nextDue.Add(delay);
}
Using this technique, your loop will execute somewhat stumbling, but it should be OK on average, as the code depends neither on the resolution of Sleep nor on the resolution of DateTime.Now.
You might even combine this approach with a Timer.

Using multiple threads to bruteforce passwords

I'm working on my 10th grade science fair project right now and I've kind of hit a wall. My project is testing the effect of parallelism on the efficiency of brute forcing md5 password hashes. I'll be calculating the # of password combinations/second it tests to see how efficient it is, using 1, 4,16,32,64,128,512,and 1024 threads. I'm not sure if I'll do dictionary brute force or pure brute force. I figure that dictionary would be easier to parallelize; just split the list up into equal parts for each thread. I haven't written much code yet; I'm just trying to plan it out before I start coding.
My questions are:
Is calculating the password combinations tested/second the best way to determine the performance based on # of threads?
Dictionary or pure brute force? If pure brute force, how would you split up the task into a variable number of threads?
Any other suggestions?
I'm not trying to dampen your enthusiasm, but this is already quite a well understood problem. I'll try to explain what to expect below. But maybe it would be better to do your project in another area. How's about "Maximising MD5 hashing throughput" then you wouldn't be restricted to just looking at threading.
I think that when you write up your project, you'll need to offer some kind of analysis as to when parallel processing is appropriate and when it isn't.
Each time that your CPU changes to another thread, it has to persist the current thread context and load the new thread context. This overhead does not occur in a single-threaded process (except for managed services like garbage collection). So all else equal, adding threads won't improve performance because it must do the original workload plus all of the context switching.
But if you have multiple CPUs (cores) at your disposal, creating one thread per CPU will mean that you can parallelize your calculations without incurring context switching costs. If you have more threads than CPUs then context switching will become an issue.
There are 2 classes of computation: IO-bound and compute-bound. An IO-bound computation can spend large amounts of CPU cycles waiting for a response from some hardware like a network card or a hard disk. Because of this overhead, you can increase the number of threads to the point where the CPU is maxed out again, and this can cancel out the cost of context switching. However there is a limit to the number of threads, beyond which context switching will take up more time than the threads spend blocking for IO.
Compute-bound computations simply require CPU time for number crunching. This is the kind of computation used by a password cracker. Compute-bound operations do not get blocked, so adding more threads than CPUs will slow down your overall throughput.
The C# ThreadPool already takes care of all of this for you - you just add tasks, and it queues them until a Thread is available. New Threads are only created when a thread is blocked. That way, context switches are minimised.
I have a quad-core machine - breaking the problem into 4 threads, each executing on its own core, will be more or less as fast as my machine can brute force passwords.
To seriously parallelize this problem, you're going to need a lot of CPUs. I've read about using the GPU of a graphics card to attack this problem.
There's an analysis of attack vectors that I wrote up here if it's any use to you. Rainbow tables and the processor/memory trade offs would be another interesting area to do a project in.
To answer your question:
1) There is nothing like the best way to test thread performance. Different problems scale differently with threads, depending on how independent each operation in the target problem is. So you can try the dictionary thing. But, when you analyse the results, the results that you get might not be applicable on all problems. One very popular example however, is that people try a shared counter, where the counter is increased by a fixed number of times by each thread.
2) Brute force will cover a large number of cases. In fact, by brute force, there can be an infinite number of possibilities. So, you might have to limit your password by some constraints like the maximum length of the password and so on. One way to distribute brute force is to assign each thread a different starting character for the password. The thread then tests all possible passwords for that starting character. Once the thread finishes its work, it gets another starting character till you use all possible starting symbols.
3) One suggestion that I would like to give you is to test on a little smaller number of threads. You are going upto 1024 threads. That is not a good idead. The number of cores on a machine is generally 4 to 10. So, try not to exceed the number of threads by a huge number than the number of cores. Because, a processor cannot run multiple threads at the same time. Its one thread per processor at any given time. Instead, try to measure performace for different schemes for assigning the problem to different threads.
Let me know if this helps!
One solution that will work for both a dictionary and a brute-force of all possible passwords is to use a approach based around dividing the job up into work units. Have a shared object responsible for dividing the problem space up into units of work - ideally, something like 100ms to 5 seconds worth of work each - and give a reference to this object to each thread you start. Each thread then operates in a loop like this:
for work_block in work_block_generator.get():
for item in work_block:
# Do work
The advantage of this over just parcelling up the whole workspace into one chunk per thread up-front is that if one thread works faster than others, it won't run out of work and just sit idle - it'll pick up more chunks.
Ideally your work item generator would have an interface that, when called, returns an iterator, which itself returns individual passwords to test. The dictionary-based one, then, selects a range from the dictionary, while the brute force one selects a prefix to test for each batch. You'll need to use synchronization primitives to stop races between different threads trying to grab work units, of course.
In both the dictionary and brute force methods, the problem is Embarrassingly Parallel.
To divide the problem for brute force with n threads, just say, the first two (or three) letters (the "prefix") into n pieces. Then, each thread has a set of assigned prefixes, like "aa - fz" where it is responsible only for testing everything that follows its prefixes.
Dictionary is usually statistically slightly better in practice for cracking more passwords, but brute force, since it covers everything, cannot miss a password within the target length.

Throttle CPU Usage of Application

A new game server just came out which our company would like to offer for rental. However, the game developers did not create any sort of hibernation mode to shut down the physics when no players are connected, so an empty server is eating 30% or so CPU.
I found this game panel addon which limits the CPU usage of Applications.
I have written a few small apps in C# .NET for our company to help improve our services and I am wondering how I would go about creating something like this. Is it possible?
You might consider simply lowering the priority of the process down. This won't limit CPU directly, but will cause the processes threads to be scheduled less often than processes with normal and higher priorities.
Check System.Diagnostics.Process.PriorityClass (doc)
My guess is that the server app is doing polling instead being event driven. Polling will use CPU unless this piece of code is converted to be event driven. The application will sleep until it receives an event from the OS that it needs to process. Polling will just spin looking for an event and wastes the CPU. Reducing the priority of the process will not really help unless with CPU usage reduction in any way. This app needs to be rewritten to be more CPU efficient.
This answer might be interesting for you and that's how I would do it.
How to restrict the CPU usage a C# program takes?
I don't know if you can do that, but you can change the thread priority of the executing thread via the Priority property. You would set that by:
Thread.CurrentThread.Priority = ThreadPriority.Lowest;
Also, I don't think you really want to cap it. If the machine is otherwise idle, you'd like it to get busy on with the task, right? ThreadPriority helps communicate this to the scheduler.
I'm assuming the game server is threaded. If this is the case, you may be able to pragmatically force CPU affinity on the application. If you had a way to tell if the game had users or not, ie if UDP packets are coming in on the assigned port, you could say "hey, no one is connected". You could then have your program force all working threads onto the same core.
So, if you had an 8 core cpu and all the threads were on one core, then at most it would use 12.5% cpu.
Once you see packets coming in on the assigned port, you could assign the affinity back to all cores.
You could take this a step further and say "Are there any "idle" games. If there are any idle games, which are all on.. lets say.. core 7, then run an infinite loop of the HLT instruction at a higher priority than the game, but force the thread to sleep so it doesn't completely starve the game.
This would cause the CPU to use less power, but would be a lot more work and have a higher chance of problems.
I would stick to forcing affinity only, and just let all the idle games share some given core.

Will Multi threading increase the speed of the calculation on Single Processor

On a single processor, Will multi-threading increse the speed of the calculation. As we all know that, multi-threading is used for Increasing the User responsiveness and achieved by sepating UI thread and calculation thread. But lets talk about only console application. Will multi-threading increases the speed of the calculation. Do we get culculation result faster when we calculate through multi-threading.
what about on multi cores, will multi threading increse the speed or not.
Please help me. If you have any material to learn more about threading. please post.
Edit:
I have been asked a question, At any given time, only one thread is allowed to run on a single core. If so, why people use multithreading in a console application.
Thanks in advance,
Harsha
In general terms, no it won't speed up anything.
Presumably the same work overall is being done, but now there is the overhead of additional threads and context switches.
On a single processor with HyperThreading (two virtual processors) then the answer becomes "maybe".
Finally, even though there is only one CPU perhaps some of the threads can be pushed to the GPU or other hardware? This is kinda getting away from the "single processor" scenario but could technically be way of achieving a speed increase from multithreading on a single core PC.
Edit: your question now mentions multithreaded apps on a multicore machine.
Again, in very general terms, this will provide an overall speed increase to your calculation.
However, the increase (or lack thereof) will depend on how parallelizable the algorithm is, the contention for memory and cache, and the skill of the programmer when it comes to writing parallel code without locking or starvation issues.
Few threads on 1 CPU:
may increase performance in case you continue with another thread instead of waiting for I/O bound operation
may decrease performance if let say there are too many threads and work is wasted on context switching
Few threads on N CPUs:
may increase performance if you are able to cut job in independent chunks and process them in independent manner
may decrease performance if you rely heavily on communication between threads and bus becomes a bottleneck.
So actually it's very task specific - you can parallel one things very easy while it's almost impossible for others. Perhaps it's a bit advanced reading for new person but there are 2 great resources on this topic in C# world:
Joe Duffy's web log
PFX team blog - they have a very good set of articles for parallel programming in .NET world including patterns and practices.
What is your calculation doing? You won't be able to speed it up by using multithreading if it a processor bound, but if for some reason your calculation writes to disk or waits for some other sort of IO you may be able to improve performance using threading. However, when you say "calculation" I assume you mean some sort of processor intensive algorithm, so adding threads is unlikely to help, and could even slow you down as the context switch between threads adds extra work.
If the task is compute bound, threading will not make it faster unless the calculation can be split in multiple independent parts. Even so you will only be able to achieve any performance gains if you have multiple cores available. From the background in your question it will just add overhead.
However, you may still want to run any complex and long running calculations on a separate thread in order to keep the application responsive.
No, no and no.
Unless you write parallelizing code to take advantage of multicores, it will always be slower if you have no other blocking functions.
Exactly like the user input example, one thread might be waiting for a disk operation to complete, and other threads can take that CPU time.
As described in the other answers, multi-threading on a single core won't give you any extra performance (hyperthreading notwithstanding). However, if your machine sports an Nvidia GPU you should be able to use the CUDA to push calculations to the GPU. See http://www.hoopoe-cloud.com/Solutions/CUDA.NET/Default.aspx and C#: Perform Operations on GPU, not CPU (Calculate Pi).
Above mention most.
Running multiple threads on one processor can increase performance, if you can manage to get more work done at the same time, instead of let the processor wait between different operations. However, it could also be a severe loss of performance due to for example synchronization or that the processor is overloaded and cant step up to the requirements.
As for multiple cores, threading can improve the performance significantly. However, much depends on finding the hotspots and not overdo it. Using threads everywhere and the need of synchronization can even lower the performance. Optimizing using threads with multiple cores takes a lot of pre-studies and planning to get a good result. You need for example to think about how many threads to be use in different situations. You do not want the threads to sit and wait for information used by another thread.
http://www.intel.com/intelpress/samples/mcp_samplech01.pdf
https://computing.llnl.gov/tutorials/parallel_comp/
https://computing.llnl.gov/tutorials/pthreads/
http://en.wikipedia.org/wiki/Superscalar
http://en.wikipedia.org/wiki/Simultaneous_multithreading
I have been doing some intensive C++ mathematical simulation runs using 24 core servers. If I run 24 separate simulations in parallel on the 24 cores of a single server, then I get a runtime for each of my simulations of say X seconds.
The bizarre thing I have noticed is that, when running only 12 simulations, using 12 of the 24 cores, with the other 12 cores idle, then each of the simulations runs at a runtime of Y seconds, where Y is much greater than X! When viewing the task manager graph of the processor usage, it is obvious that a process does not stick to only one core, but alternates between a number of cores. That is to say, the switching between cores to use all the cores slows down the calculation process.
The way I maintained the runtime when running only 12 simulations, is to run another 12 "junk" simulations on the side, using the remaining 12 cores!
Conclusion: When using multi-cores, use them all at 100%, for lower utilisation, the runtime increases!
For single core CPU,
Actually the performance depends on the job you are referring.
In your case, for calculation done by CPU, in that case OverClocking would help if your parentBoard supports it. Otherwise there is no way for CPU to do calculations that are faster than the speed of CPU.
For the sake of Multicore CPU
As the above answers say, if properly designed the performance may increase, if all cores are fully used.
In single core CPU, if the threads are implemented in User Level then multithreading wont matter if there are blocking system calls in the thread, like an I/O operation. Because kernel won't know about the userlevel threads.
So if the process does I/O then you can implement the threads in Kernel space and then you can implement different threads for different job.
(The answer here is on theory based.)
Even a CPU bound task might run faster multi-threaded if properly designed to take advantage of cache memory and pipelineing done by the processor. Modern processors spend a lot of time
twiddling their thumbs, even when nominally fully "busy".
Imagine a process that used a small chunk of memory very intensively. Processing
the same chunk of memory 1000 times would be much faster than processing 1000 chunks
of similar memory.
You could certainly design a multi threaded program that would be faster than a single thread.
Treads don't increase performance. Threads sacrifice performance in favor of keeping parts of the code responsive.
The only exception is if you are doing a computation that is so parallelizeable that you can run different threads on different cores (which is the exception, not the rule).

Categories