Think of a load balancer which is to balance the load according to the available (remaining) processing power of its units. How would you calculate this parameter to compare?
I'm trying to implement this in C# and so far I can query the CPU usage in percentage but that doesn't do since different machines might be using different processors. Perhaps if I could find out the processing power of each machine multiplied by its free CPU percentage, it would be a good estimate.
But what are the important parameters of a processor to include and how to aggregate them into one single number?
Related
I have a single thread console application.
I am confused with the concept of CPU usage. Should a good single thread application use ~100% of cpu usage (since it is available) or it should not use lots of cpu usage (since it can cause the computer to slow down)?
I have done some research but haven't found an answer to my confusion. I am a student and still learning so any feedback will be appreciated. Thanks.
It depends on what the program needs the CPU for. If it has to do a lot of work, it's common to use all of one core for some period of time. If it spends most of its time waiting for input, it will naturally tend to use the CPU less frequently. I say "less frequently" instead of "less" because:
Single threaded programs are, at any given time, either running, or they're not, so they are always using either 100% or 0% of one CPU core. Programs that appear to be only using 50% or 30% or whatever are actually just balancing periods of computational work with periods of waiting for input. Devices like hard drives are very slow compared to the CPU, so a program that's reading a lot of data from disk will use less CPU resources than one that crunches lots of numbers.
It's normal for a program to use 100% of the CPU sometimes, often even for a long time, but it's not polite to use it if you don't need it (i.e. busylooping). Such behavior crowds out other programs that could be using the CPU.
The same goes with the hard drive. People forget that the hard drive is a finite resource too, mostly because the task manager doesn't have a hard drive usage by percentage. It's difficult to gauge hard drive usage as a percentage of the total since disk accesses don't have a fixed speed, unlike the processor. However, it takes much longer to move 1GB of data on disk than it does to use the CPU to move 1GB of data in memory, and the performance impacts of HDD hogging are as bad or worse than those of CPU hogging (they tend to slow your system to a crawl without looking like any CPU usage is going on. You have probably seen it before)
Chances are that any small academic programs you write at first will use all of one core for a short period of time, and then wait. Simple stuff like prompting for a number at the command prompt is the waiting part, and doing whatever operation ad academia on it afterwards is the active part.
It depends on what it's doing. Different types of operations have different needs.
There is no non-subjective way to answer this question that apples across the boards.
The only answer that's true is "it should use only the amount of CPU necessary to do the job, and no more."
In other words, optimize as much as you can and as is reasonable. In general, the lower the CPU the better, the faster it will perform, and the less it will crash, and the less it will annoy your users.
Typically an algoritmically heavy task such as predicting weather will have to be managed by the os, because it will need all of the cpu for as much time as it will be allowed to run (untill it's done).
On the other hand, a graphical application with a static user interface, like a windows forms application for storing a bit of data for record-keeping should require very low cpu usage, since it's mainly waiting for the user to do something.
I'm trying to build tests around processor cache-line optimizations relative to parallel processing. Specifically, I'm testing how segments of my products are being impacted by False Sharing inefficiencies. To do this, I need to be able to determine my processors cache sector size (Ex. 64 bytes) so I can contrive tests with the appropriate object size ranges. So... how or where can I get this information (e.g. processor spec page, C# API call, etc...)? Cache sector size is also known as Cache Line size.
Note: I looked on the Intel site for my i7 processor spec and can't find these details, or maybe I just can't recognize it.
I have done a similar experiment. I use CPUZ and find it extremely helpful with detailed information about CPU cores, caches (L1, L2, etc)...
My suggestion: don't be distracted too much by hardware specs, focus on benchmarking because your experiment is going to take a lot of time.
I programmed my own string matching algorithm, and I want to measure its time accuratly,
to compare it with other algorithms to check if my implementation is better.
I tried (StopWatch), but it gives different time in each run, because of multiple processes running of the Windows OS. I heared about (RDTSC) that can get the number of
cycles consumed, but I do not know if it gives different cycles number in each excution too ?
Please help me; Can (RDTSC) give an accurate and same measurment of cycles for a C# function, or it is similar to (StopWatch) ? Which is the best way to get cycles number for a C# function alone without the other running processes ? and thanks alot for any help or hint
it gives different time in each run, because of multiple processes running of the Windows OS.
That is in the nature of all benchmarks.
Good benchmarks offset this by statistical means, i.e. measuring often enough to offset any side-effects from other running programs. This is the way to go. As far as precision goes, StopWatch is more than enough for benchmarks.
This requires several things (without getting into statistical details, which I’m not too good at either):
An individual should last long enough to offset measurement imprecisions introduced by the measuring method (even RDTSC isn’t completely precise), and to offset calling overhead. After all, you want to measure your algorithm, not the time it takes to run the testing loop and invoking your testing method.
Enough test runs to have confidence in the result: the more data, the higher the robustness of your statistic.
Minimize external influences, in particular systematic bias. That is to say, run all your tests on the same machine under same conditions, otherwise the results cannot be compared. At all.
Furthermore, if you run multiple runs of your tests (and you should!) interleave the different methods.
I think to have the most accurate info you should interop with GetThreadTimes():
http://msdn.microsoft.com/en-us/library/ms683237%28v=vs.85%29.aspx
In the link there is down the signature for use the function in C#.
I am tasked with building an application wherein the business users will be defining a number of rules for data manipulation & processing (e.g. taking one numerical value and splitting it equally amongst a number of records selected on the basis of the condition specified in the rule).
On a monthly basis, a batch application has to be run in order to process around half a million records as per the rules defined. Each record has around 100 fields. The environment is .NET, C# and SQL server with a third party rule engine
Could you please suggest how to go about defining and/or ascertaining what kind of hardware will be best suited if the requirement is to process records within a timeframe of let's say around 8 to 10 hours. How will the specs vary if the user either wants to increase or decrease the timeframe depending on the hardware costs?
Thanks in advance
Abby
Create the application and profile it?
Step 0. Create the application. It is impossible to tell real world performance of a multi-computer system like you're describing from "paper" specifications... You need to try it and see what holds the biggest slow downs... This is traditionally physical IO, but not always...
Step 1. Profile with sample sets of data in an isolated environment. This is a gross metric. You're not trying to isolate what takes the time, just measuring the overall time it takes to run the rules.
What does isolated environment mean? You want to use the same sorts of network hardware between the machines, but do not allow any other traffic on that network segment. That introduces too many variables at this point.
What does profile mean? With current hardware, measure how long it takes to complete under the following circumstances. Write a program to automate the data generation.
Scenario 1. 1,000 of the simplest rules possible.
Scenario 2. 1,000 of the most complex rules you can reasonably expect users to enter.
Scenarios 3 & 4. 10,000 Simplest and most complex.
Scenarios 5 & 6. 25,000 Simplest and Most complex
Scenarios 7 & 8. 50,000 Simplest and Most complex
Scenarios 9 & 10. 100,000 Simplest and Most complex
Step 2. Anaylze the data.
See if there are trends in completion time. Figure out if they appear tied to strictly the volume of rules or if the complexity also factors in... I assume it will.
Develop a trend line that shows how long you can expect it to take if there are 200,000 and 500,000 rules. Perform another run at 200,000. See if the trend line is correct, if not, revise your method of developing the trend line.
Step 3. Measure the database and network activity as the system processes the 20,000 rule sets. See if there is more activity happening with more rules. If so the more you speed up the throughput to and from the SQL server the faster it will run.
If these are "relatively low," then CPU and RAM speed are likely where you'll want to beef up the requested machines specification...
Of course if all this testing is going to cost your employer more than buying the beefiest server hardware possible, just quantify the cost of the time spent testing vs. the cost of buying the best server and being done with it and only tweaking your app and the SQL that you control to improve performance...
If this system is not first of a kind, so you can consider following:
Re-use (after additional evaluation) hardware requirements from previous projects
Evaluate hardware requirements based on workload and hardware configuration of existing application
If that is not the case and performance requirements are very important, then the best way would be to create a prototype with, say, 10 rules implemented. Process the dataset using the prototype and extrapolate to a full rule set. Based on this information you should be able to derive initial performance and hardware requirements. Then you can fine tune these specifications taking into account planned growth in processed data volume, scalability requirements and redundancy.
I have an environment that serves many devices spread across 3 time zones by receiving and sending data during the wee hours of the night. The distribution of these devices was determined pseudo-randomly based on an identification number and a simple calculation using a modulo operation. The result of such a calculation creates an unnecessary artificial peak which consumes more resources than I'd like during certain hours of the night.
As part of our protocol I can instruct devices when to connect to our system on subsequent nights.
I am seeking an algorithm which can generally distribute the peak into a more level line (albeit generally higher at most times) or at least a shove in the right direction - meaning what sort of terminology should I spend my time reading about. I have available to me identification numbers for devices, the current time, and the device's time zone as inputs for performing calculation. I can also perform some up front analytical calculations to create pools from which to draw slots from, though I feel this approach may be less elegant than I am hoping for (though a learning algorithm may not be a bad thing...).
(Ultimately and somewhat less relevant I will be implementing this algorithm using C#.)
If you want to avoid the spikes associated with using random times, look at the various hashing functions used for hashtables. Your reading might start at the wikipedia articles on the subject:
http://en.wikipedia.org/wiki/Hash_function
Basically, divide whatever you want your update window to be into the appropriate number of buckets. One option might be 3 hours * 60 minutes * 60 seconds = 10800 buckets. Then use that as your hashtable size, for the chosen hashing function. Your unique input might be device ID. Don't forget to use GMT for the chosen time. Your programming language of choice probably has a number of built in hashing functions, but the article should provide some links to get you started if you want to implement one from scratch.
This approach is superior to the earlier answer of random access times because it has much better evenness properties, and ensures that your access patterns will be approximately flat, as compared to the random function which is likely to sometimes exhibit spikes.
Here's some more specific information on how to implement various functions:
http://www.partow.net/programming/hashfunctions/index.html
You say that you can tell devices what time to connect, so I don't see why you need anything random or modulused. When each device connects, pick a time tomorrow which currently doesn't have many devices assigned to it, and assign the device to that time. If the devices all take about the same amount of resources to service, then a trivial greedy algorithm will produce a completely smooth distribution - assign each device to whatever time is currently least congested. If the server handles other work than just these devices, then you'd want to start with its typical load profile, then add the device load to that. I wouldn't really call this "analytical calculations", just storing a histogram of expected load against time for the next 24 hours.
Or do you have the problem that the device might not obey instructions (for example it might be offline at its assigned time, and then connect whenever it's next on)? Obviously if your users in a particular time zone all start work at the same time in the morning, then that would be a problematic strategy.
Simply take the number of devices and divide your time interval into n equal segments and allocate each segment to a device, informing them of when to connect when they next connect.
This will give you an optimally uniform distribution in all cases.
Normalize all times to GMT, what do you care about time zones or day light savings time or whatever? Now is now no matter what time zone you're in.
Adding a random distribution can lead to clumping (a uniform random distribution is only uniform in the limit, but not necessarily for any particular sample), and really should be used if there's no feedback mechanism. Since you can control to some extent when they connect a random component is not at all necessary and is not even remotely optimal.
If you're concerned about clock drift across devices consider even if you added randomness this wouldn't decrease the randomness of your clock drift in any way, and would only contribute to an even less optimal allocation.
If you want to ensure a stable distribution of devices by region, then compute the ratio of devices per region, and distribute the slot allocations appropriately. For instance, if you have 50/25/25 by time zone respectively, assign slots to the first time zone, then the next two slots to the remaining time zones, then repeat.