Running maximum threads: Automatic performance adjustment

Running maximum threads: Automatic performance adjustment - c#

I'm developing an app which one scans thousands copies of a struct; ~1 GB RAM. Speed is important.
ParallelScan(_from, _to); //In a new thread
I manually adjust the threads count:
if (myStructs.Count == 0) { threads = 0; }
else if (myStructs.Count < 1 * Number.Thousand) { threads = 1; }
else if (myStructs.Count < 3 * Number.Thousand) { threads = 2; }
else if (myStructs.Count < 5 * Number.Thousand) { threads = 4; }
else if (myStructs.Count < 10 * Number.Thousand) { threads = 8; }
else if (myStructs.Count < 20 * Number.Thousand) { threads = 12; }
else if (myStructs.Count < 30 * Number.Thousand) { threads = 20; }
else if (myStructs.Count < 50 * Number.Thousand) { threads = 30; }
else threads = 40;
I just wrote it from scratch and I need to modify it for another CPU, etc. I think I could write a smarter code which one dynamically starts a new thread if CPU is available at the moment:
If CPU is not %100 start N thread
Measure CPU or thread process time & modify/estimate N
Loop until scan all struct array
Is there anyone think that "I did something similar" or "I have a better idea" ?
UPDATE: The solution
Parallel.For(0, myStructs.Count - 1, (x) =>
{
ParallelScan(x, x); // Will be ParallelScan(x);
});
I did trim tons of code. Thanks people!
UPDATE 2: Results
Scan time for 10K templates
1 Thread: 500 ms
10 Threads: 300 ms
40 Threads: 600 ms
Tasks: 100 ms

The standard answer: Use Tasks (TPL) , not Threads. Tasks require Fx4.
Your ParallelScan could just use Parallel.Foreach( ... ) or PLINQ (.AsParallel()).
The TPL framework includes a scheduler, and ForEach() uses a partitioner, to adapt to CPU cores and load. Your problem is most likely solved with the standard components but you can write custom-schedulers and -partitioners.

Actually, you won't get much benefit from spanning 50 threads, if you CPU only has two cores (even if each of them supports hyperthreading). If will actually run slower due to context switching which will occur every once in a while.
That means you should go for the Task Parallel Library (.NET 4), which takes care that all available cores are used efficiently.
Apart from that, improving the asymptotic duration of your search algorithm might prove more valuable for large quantities of data, regardless of the Moore's law.
[Edit]
If you are unable/unwilling to use .NET 4 TPL, you can start by getting the information about the current number of logical processors in the system (use Environment.ProcessorCount or check this answer for detailed info). Based on that number, you can partition your data and span a fixed number of threads. That is much simpler that checking the CPU utilization, and should prevent creating unnecessary threads which are starved anyway.

OK, sorry to keep going on but first to compile my comments:
Unless you have a very, very, very, good reason to think that scanning these structs will take any more than a handful of microseconds and that really, really, really matters, it's not a good idea to do this kind of optimisation. If you really want to do it, you should have one thread per core. But really - don't. If it's just 50,000 structs and you're doing something simple with them, don't bother.
FYI, starting a new thread takes a good amount of time (a measurable part of a second, several milliseconds).
How long does this operation take? It's very unlikely that it's useful for you to optimize multithreading like this. It will give you the worst improvement. Better improvement will be gained by a better algorithm, or not having to depend on this weird invented multithreading scheme.
I'm confused about your performance fixation partly because you say you're looking through 50,000 structs (a very quick and easy operation) and partly because you're using structs. Without boxing that's a value type and if you're passing them around threads you're copying data rather than references, i.e. using more memory. My point being that that's a lot of data/memory, unless the structs are small, in which case, what kind of processing can you possibly be doing on them that takes so long as to think about 40+ threads in parallel?
If performance is truly incredibly important and your goal, and you're not simply trying to do this as a nice engineering exercise, please share information about what kind of processing you're doing.

Related

Why the following C# program uses limited (10) number of threads? [duplicate]

I have just did a sample for multithreading using This Link like below:
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
int count = 0;
Parallel.For(0, 50000, options,(i, state) =>
{
count++;
});
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
Console.ReadKey();
It gives me 15 thread before Parellel.For and after it gives me 17 thread only. So only 2 thread is occupy with Parellel.For.
Then I have created a another sample code using This Link like below:
var options = new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount * 10 };
Console.WriteLine("MaxDegreeOfParallelism : {0}", Environment.ProcessorCount * 10);
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
int count = 0;
Parallel.For(0, 50000, options,(i, state) =>
{
count++;
});
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
Console.ReadKey();
In above code, I have set MaxDegreeOfParallelism where it sets 40 but is still taking same threads for Parallel.For.
So how can I increase running thread for Parallel.For?

I am facing a problem that some numbers is skipped inside the Parallel.For when I perform some heavy and complex functionality inside it. So here I want to increase the maximum thread and override the skipping issue.
What you're saying is something like: "My car is shaking when driving too fast. I'm trying to avoid this by driving even faster." That doesn't make any sense. What you need is to fix the car, not change the speed.
How exactly to do that depends on what are you actually doing in the loop. The code you showed is obviously placeholder, but even that's wrong. So I think what you should do first is to learn about thread safety.
Using a lock is one option, and it's the easiest one to get correct. But it's also hard to make it efficient. What you need is to lock only for a short amount of time each iteration.
There are other options how to achieve thread safety, including using Interlocked, overloads of Parallel.For that use thread-local data and approaches other than Parallel.For(), like PLINQ or TPL Dataflow.
After you made sure your code is thread safe, only then it's time to worry about things like the number of threads. And regarding that, I think there are two things to note:
For CPU-bound computations, it doesn't make sense to use more threads than the number of cores your CPU has. Using more threads than that will actually usually lead to slower code, since switching between threads has some overhead.
I don't think you can measure the number of threads used by Parallel.For() like that. Parallel.For() uses the thread pool and it's quite possible that there already are some threads in the pool before the loop begins.

Parallel loops use hardware CPU cores. If your CPU has 2 cores, this is the maximum degree of paralellism that you can get in your machine.
Taken from MSDN:
What to Expect
By default, the degree of parallelism (that is, how many iterations run at the same time in hardware) depends on the
number of available cores. In typical scenarios, the more cores you
have, the faster your loop executes, until you reach the point of
diminishing returns that Amdahl's Law predicts. How much faster
depends on the kind of work your loop does.
Further reading:
Threading vs Parallelism, how do they differ?
Threading vs. Parallel Processing

Parallel loops will give you wrong result for summation operations without locks as result of each iteration depends on a single variable 'Count' and value of 'Count' in parallel loop is not predictable. However, using locks in parallel loops do not achieve actual parallelism. so, u should try something else for testing parallel loop instead of summation.

Why was the parallel version slower than the sequential version in this example?

I've been learning a little about parallelism in the last few days, and I came across this example.
I put it side to side with a sequential for loop like this:
private static void NoParallelTest()
{
int[] nums = Enumerable.Range(0, 1000000).ToArray();
long total = 0;
var watch = Stopwatch.StartNew();
for (int i = 0; i < nums.Length; i++)
{
total += nums[i];
}
Console.WriteLine("NoParallel");
Console.WriteLine(watch.ElapsedMilliseconds);
Console.WriteLine("The total is {0}", total);
}
I was surprised to see that the NoParallel method finished way way faster than the parallel example given at the site.
I have an i5 PC.
I really thought that the Parallel method would finish faster.
Is there a reasonable explanation for this? Maybe I misunderstood something?

The sequential version was faster because the time spent doing operations on each iteration in your example is very small and there is a fairly significant overhead involved with creating and managing multiple threads.
Parallel programming only increases efficiency when each iteration is sufficiently expensive in terms of processor time.

I think that's because the loop performs a very simple, very fast operation.
In the case of the non-parallel version that's all it does. But the parallel version has to invoke a delegate. Invoking a delegate is quite fast and usually you don't have to worry how often you do that. But in this extreme case, it's what makes the difference. I can easily imagine that invoking a delegate will be, say, ten times slower (or more, I have no idea what the exact ratio is) than adding a number from an array.

C# System CPU Usage and syncing with Windows Task Manager

this is a two part question, I wanted to post my code here on stack to help others with the same task.
Question 1:
I have a subset of code, which I believe, is correctly measuring CPU usage (across as many cores in the system, as per times retrieved) as per the measurement interval - I use 1 second in the thread call.
I had to decipher this from the very few articles on the web and from C++ code. My question is, for question 1, is this correct what I have done?
Sometimes the value returned is a minus figure which is why I multiply by -1. Again, I am assuming, since there is very little documentation, that this is what I should be doing.
I have the following code:
public static class Processor
{
[DllImport("kernel32.dll", SetLastError = true)]
static extern bool GetSystemTimes(out ComTypes.FILETIME lpIdleTime, out ComTypes.FILETIME lpKernelTime, out ComTypes.FILETIME lpUserTime);
private static TimeSpan _sysIdleOldTs;
private static TimeSpan _sysKernelOldTs;
private static TimeSpan _sysUserOldTs;
static Processor()
{
}
public static void Test()
{
ComTypes.FILETIME sysIdle, sysKernel, sysUser;
if(GetSystemTimes(out sysIdle, out sysKernel, out sysUser))
{
TimeSpan sysIdleTs = GetTimeSpanFromFileTime(sysIdle);
TimeSpan sysKernelTs = GetTimeSpanFromFileTime(sysKernel);
TimeSpan sysUserTs = GetTimeSpanFromFileTime(sysUser);
TimeSpan sysIdleDiffenceTs = sysIdleTs.Subtract(_sysIdleOldTs);
TimeSpan sysKernelDiffenceTs = sysKernelTs.Subtract(_sysKernelOldTs);
TimeSpan sysUserDiffenceTs = sysUserTs.Subtract(_sysUserOldTs);
_sysIdleOldTs = sysIdleTs;
_sysKernelOldTs = sysKernelTs;
_sysUserOldTs = sysUserTs;
TimeSpan system = sysKernelDiffenceTs.Add(sysUserDiffenceTs);
Double cpuUsage = (((system.Subtract(sysIdleDiffenceTs).TotalMilliseconds) * 100) / system.TotalMilliseconds);
if (cpuUsage < 0)
{
Console.WriteLine("CPU: " + ((int) (cpuUsage)*-1) + "%");
}
else
{
Console.WriteLine("CPU: " + (int) (cpuUsage) + "%");
}
Console.WriteLine("");
}
else
{
Console.WriteLine("Couldn't get CPU usage!");
Console.WriteLine("");
}
}
private static TimeSpan GetTimeSpanFromFileTime(ComTypes.FILETIME time)
{
return TimeSpan.FromMilliseconds((((ulong)time.dwHighDateTime << 32) + (uint)time.dwLowDateTime) * 0.000001);
}
}
Question 2:
Is there anyway for me to sync a thread, in my program, with that of the Windows Task Manager, for the purpose of matching measurement figure e.g CPU Usage with the above code?
What I mean is, if you open Windows Task Manager, you will notice that it polls every second - which in reality it doesn't need to be less than that. What I want to do is match the timing with my thread.
So when Windows Task Manager polls, my thread polls.
Some notes:
I didn't want to use Performance Counters or .NET built in methods. In fact, I believe - from what I have read, .NET doesn't have methods for calculating the CPU usage on a machine, that Performance counters are required for this otherwise.
Performance counters have overhead and in addition make the GC grow, not to mention the delay in calling the next result. While my software does not need to be real-time performance I do need it to be as responsive and use as little CPU time as possible. The above code can be called and returned in less than a millisecond. In fact on my development machine, the time-span difference shows 0ms. I don't believe Performance Counters are as responsive.
In case you are curious, my software is gathering a number of items, CPU, Memory, Event Log items etc. of which these all need to be gathered and stored, in SQL CE, before the next poll, 1 second away. Each task, item, however is on its own thread to facilitate this.
Also, the code above is not optimized in anyway and you will notice I have yet to comment it also. The reason being is I want to make sure it is correct before optimization etc.
Update 1
As per a coment I made down the way, I removed the extra "System" timespan as it is not required and modified the line that retrieves the "CPU Usage" and cast it appropriately.
int cpuUsage = (int)(((sysKernelDifferenceTs.Add(sysUserDifferenceTs).Subtract(sysIdleDifferenceTs).TotalMilliseconds) * 100.00) / sysKernelDifferenceTs.Add(sysUserDifferenceTs).TotalMilliseconds);
Though I am still unsure of the formula. While it seems to be highly accurate it does on occasion return a minus figure which is why I multiply it by -1 if that is the case. After all, there is no such thing a -2% CPU usage etc.
Update 2
So I did a simple test using "System.Diagnostics.PerformanceCounter". While incredibly handy and does exactly what it is intended to do it does create overhead.
Here are my observations:
It took the Performance Counter that much longer to initialize. In the order of roughly three seconds longer on my i7 2.6 Ghz.
The performance counter also seemed to add on another approx 5MB of RAM usage simply by using it. What I mean by this is: With the code above ,my app maxes out at 7.5MB ram. With the performance counter it "starts" at 12.5MB.
Over the space of 5 seconds, where my thread ran 5 times - once per second, the memory of my app had grown by 1 MB and this increase is consistent with time, although it does level out, in my case anyway, 3-4MB above starting. So where my app is usually 7.5MB ram with the code above, the PC code leveled out at 16.5 MB ram - an increase of 9MB over the code above. Note: The code above does not cause this increase.
So, if your application was built in a manner where resource usage and timing is key I would suggest against using Performance counters because of these reasons. Otherwise go ahead as it works without all the mess.
As for my app, performance counters will be detrimental to my software's purpose.

I think you have a bug in your formula. You want to basically compute CPU usage as this:
CPU Usage = KernelTimeDiff + UserTimeDiff
--------------------------------------------
KernelTimeDiff + UserTimeDiff + IdleTimeDiff
Thus, a quick mod to your code as follows:
// TimeSpan system = sysKernelDiffenceTs.Add(sysUserDiffenceTs);
//Double cpuUsage = (((system.Subtract(sysIdleDiffenceTs).TotalMilliseconds) * 100) / system.TotalMilliseconds);
TimeSpan totaltime = sysKernelDiffenceTs.Add(sysUserDiffenceTs);
totaltime = totaltime.Add(sysIdleDifferenceTs);
int cpuUsage = 100 - (sysIdleDifferenceTs.TotalMilliseconds * 100) / totaltime.TotalMilliseconds;
Console.WriteLine("CPU: " + cpuUsage + "%");
You originally declared cpuUsage as "Double". I'm not sure if you wanted floating point precision, but in your code, you definitely weren't getting anything other than integer precision because the assignment statement was just doing integer math. If you need higher precision from the computation, you could easily get it by mixing in some floating point:
Double cpuUsage = 100.0 - (sysIdleDifferenceTs.TotalMilliseconds * 100.0) /totaltime.TotalMilliseconds;
Also, in regards to being in sync with Task Manager. Task Manager, as I understand it, uses perf counters. (And I would suspect that GetSystemTimes is making perf counter calls under the hood, but perhaps not). And I'm not sure why you wouldn't use perf counters either. The "% Process Time" counter is an instant sample counter that doesn't require computing a diff with a previous result. (There's one per logical cpu). Use the PDH helper functions instead of the legacy registry key apis to get at it. You can do this from an unmanaged C/C++ DLL that exports a "GetCpuUsage" function back to your C# code. But I don't know why you couldn't just PInvoke the PDH functions from C# either. I don't know about this overhead that you speak of. I'm not sure I understand your reference to " the delay in calling the next result" either.

While loop execution time

We were having a performance issue in a C# while loop. The loop was super slow doing only one simple math calc. Turns out that parmIn can be a huge number anywhere from 999999999 to MaxInt. We hadn't anticipated the giant value of parmIn. We have fixed our code using a different methodology.
The loop, coded for simplicity below, did one math calc. I am just curious as to what the actual execution time for a single iteration of a while loop containing one simple math calc is?
int v1=0;
while(v1 < parmIn) {
v1+=parmIn2;
}

There is something else going on here. The following will complete in ~100ms for me. You say that the parmIn can approach MaxInt. If this is true, and the ParmIn2 is > 1, you're not checking to see if your int + the new int will overflow. If ParmIn >= MaxInt - parmIn2, your loop might never complete as it will roll back over to MinInt and continue.
static void Main(string[] args)
{
int i = 0;
int x = int.MaxValue - 50;
int z = 42;
System.Diagnostics.Stopwatch st = new System.Diagnostics.Stopwatch();
st.Start();
while (i < x)
{
i += z;
}
st.Stop();
Console.WriteLine(st.Elapsed.Milliseconds.ToString());
Console.ReadLine();
}

Assuming an optimal compiler, it should be one operation to check the while condition, and one operation to do the addition.

The time, small as it is, to execute just one iteration of the loop shown in your question is ... surprise ... small.
However, it depends on the actual CPU speed and whatnot exactly how small it is.
It should be just a few machine instructions, so not many cycles to pass once through the iteration, but there could be a few cycles to loop back up, especially if branch prediction fails.
In any case, the code as shown either suffers from:
Premature optimization (in that you're asking about timing for it)
Incorrect assumptions. You can probably get a much faster code if parmIn is big by just calculating how many loop iterations you would have to perform, and do a multiplication. (note again that this might be an incorrect assumption, which is why there is only one sure way to find performance issues, measure measure measure)
What is your real question?

It depends on the processor you are using and the calculation it is performing. (For example, even on some modern architectures, an add may take only one clock cycle, but a divide may take many clock cycles. There is a comparison to determine if the loop should continue, which is likely to be around one clock cycle, and then a branch back to the start of the loop, which may take any number of cycles depending on pipeline size and branch prediction)
IMHO the best way to find out more is to put the code you are interested into a very large loop (millions of iterations), time the loop, and divide by the number of iterations - this will give you an idea of how long it takes per iteration of the loop. (on your PC). You can try different operations and learn a bit about how your PC works. I prefer this "hands on" approach (at least to start with) because you can learn so much more from physically trying it than just asking someone else to tell you the answer.

The while loop is couple of instructions and one instruction for the math operation. You're really looking at a minimal execution time for one iteration. it's the sheer number of iterations you're doing that is killing you.
Note that a tight loop like this has implications on other things as well, as it bogs down one CPU and it blocks the UI thread (if it's running on it). Thus, not only it is slow due to the number of operations, it also adds a perceived perf impact due to making the whole machine look unresponsive.

If you're interested in the actual execution time, why not time it for yourself and find out?
int parmIn = 10 * 1000 * 1000; // 10 million
int v1=0;
Stopwatch sw = Stopwatch.StartNew();
while(v1 < parmIn) {
v1+=parmIn2;
}
sw.Stop();
double opsPerSec = (double)parmIn / sw.Elapsed.TotalSeconds;
And, of course, the time for one iteration is 1/opsPerSec.

Whenever someone asks about how fast control structures in any language you know they are trying to optimize the wrong thing. If you find yourself changing all your i++ to ++i or changing all your switch to if...else for speed you are micro-optimizing. And micro optimizations almost never give you the speed you want. Instead, think a bit more about what you are really trying to do and devise a better way to do it.
I'm not sure if the code you posted is really what you intend to do or if it is simply the loop stripped down to what you think is causing the problem. If it is the former then what you are trying to do is find the largest value of a number that is smaller than another number. If this is really what you want then you don't really need a loop:
// assuming v1, parmIn and parmIn2 are integers,
// and you want the largest number (v1) that is
// smaller than parmIn but is a multiple of parmIn2.
// AGAIN, assuming INTEGER MATH:
v1 = (parmIn/parmIn2)*parmIn2;
EDIT: I just realized that the code as originally written gives the smallest number that is a multiple of parmIn2 that is larger than parmIn. So the correct code is:
v1 = ((parmIn/parmIn2)*parmIn2)+parmIn2;
If this is not what you really want then my advise remains the same: think a bit on what you are really trying to do (or ask on Stackoverflow) instead of trying to find out weather while or for is faster. Of course, you won't always find a mathematical solution to the problem. In which case there are other strategies to lower the number of loops taken. Here's one based on your current problem: keep doubling the incrementer until it is too large and then back off until it is just right:
int v1=0;
int incrementer=parmIn2;
// keep doubling the incrementer to
// speed up the loop:
while(v1 < parmIn) {
v1+=incrementer;
incrementer=incrementer*2;
}
// now v1 is too big, back off
// and resume normal loop:
v1-=incrementer;
while(v1 < parmIn) {
v1+=parmIn2;
}
Here's yet another alternative that speeds up the loop:
// First count at 100x speed
while(v1 < parmIn) {
v1+=parmIn2*100;
}
// back off and count at 50x speed
v1-=parmIn2*100;
while(v1 < parmIn) {
v1+=parmIn2*50;
}
// back off and count at 10x speed
v1-=parmIn2*50;
while(v1 < parmIn) {
v1+=parmIn2*10;
}
// back off and count at normal speed
v1-=parmIn2*10;
while(v1 < parmIn) {
v1+=parmIn2;
}
In my experience, especially with graphics programming where you have millions of pixels or polygons to process, speeding up code usually involve adding even more code which translates to more processor instructions instead of trying to find the fewest instructions possible for the task at hand. The trick is to avoid processing what you don't have to.

Testing your code for speed?

I'm a total newbie, but I was writing a little program that worked on strings in C# and I noticed that if I did a few things differently, the code executed significantly faster.
So it had me wondering, how do you go about clocking your code's execution speed? Are there any (free)utilities? Do you go about it the old-fashioned way with a System.Timer and do it yourself?

What you are describing is known as performance profiling. There are many programs you can get to do this such as Jetbrains profiler or Ants profiler, although most will slow down your application whilst in the process of measuring its performance.
To hand-roll your own performance profiling, you can use System.Diagnostics.Stopwatch and a simple Console.WriteLine, like you described.
Also keep in mind that the C# JIT compiler optimizes code depending on the type and frequency it is called, so play around with loops of differing sizes and methods such as recursive calls to get a feel of what works best.

ANTS Profiler from RedGate is a really nice performance profiler. dotTrace Profiler from JetBrains is also great. These tools will allow you to see performance metrics that can be drilled down the each individual line.
Scree shot of ANTS Profiler:
ANTS http://www.red-gate.com/products/ants_profiler/images/app/timeline_calltree3.gif
If you want to ensure that a specific method stays within a specific performance threshold during unit testing, I would use the Stopwatch class to monitor the execution time of a method one ore many times in a loop and calculate the average and then Assert against the result.

Just a reminder - make sure to compile in Relase, not Debug! (I've seen this mistake made by seasoned developers - it's easy to forget).

What are you describing is 'Performance Tuning'. When we talk about performance tuning there are two angle to it. (a) Response time - how long it take to execute a particular request/program. (b) Throughput - How many requests it can execute in a second. When we typically 'optimize' - when we eliminate unnecessary processing both response time as well as throughput improves. However if you have wait events in you code (like Thread.sleep(), I/O wait etc) your response time is affected however throughput is not affected. By adopting parallel processing (spawning multiple threads) we can improve response time but throughput will not be improved. Typically for server side application both response time and throughput are important. For desktop applications (like IDE) throughput is not important only response time is important.
You can measure response time by 'Performance Testing' - you just note down the response time for all key transactions. You can measure the throughput by 'Load Testing' - You need to pump requests continuously from sufficiently large number of threads/clients such that the CPU usage of server machine is 80-90%. When we pump request we need to maintain the ratio between different transactions (called transaction mix) - for eg: in a reservation system there will be 10 booking for every 100 search. there will be one cancellation for every 10 booking etc.
After identifying the transactions require tuning for response time (performance testing) you can identify the hot spots by using a profiler.
You can identify the hot spots for throughput by comparing the response time * fraction of that transaction. Assume in search, booking, cancellation scenario, ratio is 89:10:1.
Response time are 0.1 sec, 10 sec and 15 sec.
load for search - 0.1 * .89 = 0.089
load for booking- 10 * .1 = 1
load for cancell= 15 * .01= 0.15
Here tuning booking will yield maximum impact on throughput.
You can also identify hot spots for throughput by taking thread dumps (in the case of java based applications) repeatedly.

Use a profiler.
Ants (http://www.red-gate.com/Products/ants_profiler/index.htm)
dotTrace (http://www.jetbrains.com/profiler/)
If you need to time one specific method only, the Stopwatch class might be a good choice.

I do the following things:
1) I use ticks (e.g. in VB.Net Now.ticks) for measuring the current time. I subtract the starting ticks from the finished ticks value and divide by TimeSpan.TicksPerSecond to get how many seconds it took.
2) I avoid UI operations (like console.writeline).
3) I run the code over a substantial loop (like 100,000 iterations) to factor out usage / OS variables as best as I can.

You can use the StopWatch class to time methods. Remember the first time is often slow due to code having to be jitted.

There is a native .NET option (Team Edition for Software Developers) that might address some performance analysis needs. From the 2005 .NET IDE menu, select Tools->Performance Tools->Performance Wizard...
[GSS is probably correct that you must have Team Edition]

This is simple example for testing code speed. I hope I helped you
class Program {
static void Main(string[] args) {
const int steps = 10000;
Stopwatch sw = new Stopwatch();
ArrayList list1 = new ArrayList();
sw.Start();
for(int i = 0; i < steps; i++) {
list1.Add(i);
}
sw.Stop();
Console.WriteLine("ArrayList:\tMilliseconds = {0},\tTicks = {1}", sw.ElapsedMilliseconds, sw.ElapsedTicks);
MyList list2 = new MyList();
sw.Start();
for(int i = 0; i < steps; i++) {
list2.Add(i);
}
sw.Stop();
Console.WriteLine("MyList: \tMilliseconds = {0},\tTicks = {1}", sw.ElapsedMilliseconds, sw.ElapsedTicks);

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.