Unexpected complexity in a matlab function

Unexpected complexity in a matlab function - c#

I am developing a program in c#, and thanks to the matlab .net builder,
I am using a matlab mapping toolbox function "polybool", which in one of it's options calculate the difference of 2 polygons in 2-D.
The problem is that the functions takes about 0.01 seconds to finish in which is bad for me
because I call it a lot.
And this doesn't make sense at all because the polygons are 5 points each, so there is no
way that it take 0.01 second to find the results.
Does anyone has any ideas?

How are you computing the 0.01 seconds? If this is total operational time, it may very well be the marshaling in and out of the toolbox functionality, which will take some time. The actual routine may be running quickly, but getting your data from C# into the routine, and the results back, will have some overhead involved with the process.
Granted, this overhead probably scales well - since it's most likely (mostly) constant, so if you start dealing with larger polygons, you'll probably see your overall efficiencies scale very well.

Related

Multi-Threading or GPU calculations

I'm currently attempting to create a Bitcoin Miner written in C# XNA.
https://github.com/Generalkidd/XNAMiner
Now the problem is, the actual number crunching of the Miner seems to be taking up too much CPU time and therefore, the UI of the program pretty much freezes at launch, although I do believe the calculations are still happening in the background despite the window being frozen and unresponsive. I tried implementing Aphid's ParallelTasks library and migrated some of the for-loops into a different thread. I didn't fully understand how these parallel for-loops worked and thus the version I created did not iterate correctly, however, the program did speed up a lot. There were still a couple for-loops left as well as a bunch of foreach loops.
What's the easiest and most efficient way to optimize my code? Should I try moving each loop into its own thread? Or try moving entire methods into their own threads? Or would it be possible to use the GPU for these calculations (it'd ultimately be better that way given the state of CPU mining).

I hope you are doing this as an educational effort, as CPU/GPU mining has become obsolete since 2011. You can barely make your hardware investment even with free electricity. ASICs are the new thing for mining now.
Different GPU/CPU hash rates
Mining Calculator

how does a c# profiler work?

I'm curious how does a typical C# profiler work?
Are there special hooks in the virtual machine?
Is it easy to scan the byte code for function calls and inject calls to start/stop timer?
Or is it really hard and that's why people pay for tools to do this?
(as a side note i find a bit interesting bec it's so rare - google misses the boat completely on the search "how does a c# profiler work?" doesn't work at all - the results are about air conditioners...)

There is a free CLR Profiler by Microsoft, version 4.0.
https://www.microsoft.com/downloads/en/details.aspx?FamilyID=be2d842b-fdce-4600-8d32-a3cf74fda5e1
BTW, there's a nice section in the CLR Profiler doc that describes how it works, in detail, page 103. There's source as part of distro.

Is it easy to scan the byte code for
function calls and inject calls to
start/stop timer?
Or is it really hard and that's why
people pay for tools to do this?
Injecting calls is hard enough that tools are needed to do it.
Not only is it hard, it's a very indirect way to find bottlenecks.
The reason is what a bottleneck is is one or a small number of statements in your code that are responsible for a good percentage of time being spent, time that could be reduced significantly - i.e. it's not truly necessary, i.e. it's wasteful.
IF you can tell the average inclusive time of one of your routines (including IO time), and IF you can multiply it by how many times it has been called, and divide by the total time, you can tell what percent of time the routine takes.
If the percent is small (like 10%) you probably have bigger problems elsewhere.
If the percent is larger (like 20% to 99%) you could have a bottleneck inside the routine.
So now you have to hunt inside the routine for it, looking at things it calls and how much time they take. Also you want to avoid being confused by recursion (the bugaboo of call graphs).
There are profilers (such as Zoom for Linux, Shark, & others) that work on a different principle.
The principle is that there is a function call stack, and during all the time a routine is responsible for (either doing work or waiting for other routines to do work that it requested) it is on the stack.
So if it is responsible for 50% of the time (say), then that's the amount of time it is on the stack,
regardless of how many times it was called, or how much time it took per call.
Not only is the routine on the stack, but the specific lines of code costing the time are also on the stack.
You don't need to hunt for them.
Another thing you don't need is precision of measurement.
If you took 10,000 stack samples, the guilty lines would be measured at 50 +/- 0.5 percent.
If you took 100 samples, they would be measured as 50 +/- 5 percent.
If you took 10 samples, they would be measured as 50 +/- 16 percent.
In every case you find them, and that is your goal.
(And recursion doesn't matter. All it means is that a given line can appear more than once in a given stack sample.)
On this subject, there is lots of confusion. At any rate, the profilers that are most effective for finding bottlenecks are the ones that sample the stack, on wall-clock time, and report percent by line. (This is easy to see if certain myths about profiling are put in perspective.)

1) There's no such thing as "typical". People collect profile information by a variety of means: time sampling the PC, inspecting stack traces, capturing execution counts of methods/statements/compiled instructions, inserting probes in code to collect counts and optionally calling contexts to get profile data on a call-context basis. Each of these techniques might be implemented in different ways.
2) There's profiling "C#" and profiling "CLR". In the MS world, you could profile CLR and back-translate CLR instruction locations to C# code. I don't know if Mono uses the same CLR instruction set; if they did not, then you could not use the MS CLR profiler; you'd have to use a Mono IL profiler. Or, you could instrument C# source code to collect the profiling data, and then compile/run/collect that data on either MS, Mono, or somebody's C# compatible custom compiler, or C# running in embedded systems such as WinCE where space is precious and features like CLR-built-ins tend to get left out.
One way to instrument source code is to use source-to-source transformations, to map the code from its initial state to code that contains data-collecting code as well as the original program. This paper on instrumenting code to collect test coverage data shows how a program transformation system can be used to insert test coverage probes by inserting statements that set block-specific boolean flags when a block of code is executed. A counting-profiler substitutes counter-incrementing instructions for those probes. A timing profiler inserts clock-snapshot/delta computations for those probes. Our C# Profiler implements both counting and timing profiling for C# source code both ways; it also collect the call graph data by using more sophisticated probes that collect the execution path. Thus it can produce timing data on call graphs this way. This scheme works anywhere you can get your hands on a halfway decent resolution time value.

This is a link to a lengthy article that discusses both instrumentation and sampling methods:
http://smartbear.com/support/articles/aqtime/profiling/

RDTSC vs. StopWatch

I programmed my own string matching algorithm, and I want to measure its time accuratly,
to compare it with other algorithms to check if my implementation is better.
I tried (StopWatch), but it gives different time in each run, because of multiple processes running of the Windows OS. I heared about (RDTSC) that can get the number of
cycles consumed, but I do not know if it gives different cycles number in each excution too ?
Please help me; Can (RDTSC) give an accurate and same measurment of cycles for a C# function, or it is similar to (StopWatch) ? Which is the best way to get cycles number for a C# function alone without the other running processes ? and thanks alot for any help or hint

it gives different time in each run, because of multiple processes running of the Windows OS.
That is in the nature of all benchmarks.
Good benchmarks offset this by statistical means, i.e. measuring often enough to offset any side-effects from other running programs. This is the way to go. As far as precision goes, StopWatch is more than enough for benchmarks.
This requires several things (without getting into statistical details, which I’m not too good at either):
An individual should last long enough to offset measurement imprecisions introduced by the measuring method (even RDTSC isn’t completely precise), and to offset calling overhead. After all, you want to measure your algorithm, not the time it takes to run the testing loop and invoking your testing method.
Enough test runs to have confidence in the result: the more data, the higher the robustness of your statistic.
Minimize external influences, in particular systematic bias. That is to say, run all your tests on the same machine under same conditions, otherwise the results cannot be compared. At all.
Furthermore, if you run multiple runs of your tests (and you should!) interleave the different methods.

I think to have the most accurate info you should interop with GetThreadTimes():
http://msdn.microsoft.com/en-us/library/ms683237%28v=vs.85%29.aspx
In the link there is down the signature for use the function in C#.

How Long Does 1 Multiplication Operation Take Between Two Doubles?

I have a process I need to optimize and I was wondering how long a multiplication operation takes between two doubles. If I can cut off 1000 of these, I want to know if it will actually make a difference in the overall performance of my process?

This is highly system specific. On my system, it only takes a few milliseconds to do 10 million multiplication operations. Removing 1000 is probably not going to be noticeable.
If you really want to optimize your routine, this isn't the best approach. The better approach is to profile it, and find the bottleneck in your current implementation (which will likely not be what you expect). Then look at that bottleneck, and try to come up with a better algorithm. Focus on overall algorithms first, and optimize those.
If it's still too slow, then you can start trying to optimize the actual routine in the slower sections or the ones called many times, first.
The only effective means of profiling is to measure first (and after!).

That entirely depends on the size of the factors. I can do single-digit multiplication (e.g. 7×9) in my head in a fraction of a second, whereas it would take me a few minutes to compute 365286×475201.

Modern Intel CPU's do in the 10's of billions of floating point multiplies per second. I wouldn't worry about 1000 if I were you.
Intel doc showing FLOP performance of their CPUs

this depends on various things like, the cpu you are using, the other processes currently running, what the jit does ...
the only reliable method to get an answer to this question is using a profiler and meassuring the effect of your optimization

Which of the following tasks dont always spent a consistent amount of time?

I am trying to make the loading part of a C# program faster. Currently it takes like 15 seconds to load up.
On first glimpse, things that are done during the loading part includes constructing many 3rd Party UI components, loading layout files, xmls, DLLs, resources files, reflections, waiting for WndProc... etc.
I used something really simple to see the time some part takes,
i.e. breakpointing at a double which holds the total milliseconds of a TimeSpan which is the difference of a DateTime.Now at the start and a DateTime.Now at the end.
Trying that a few times will give me sth like,
11s 13s 12s 12s 7s 11s 12s 11s 7s 13s 7s.. (Usually 12s, but 7s sometimes)
If I add SuspendLayout, BeginUpdate like hell; call things in reflections once instead of many times; reduce some redundant redundant computation redundancy. The time are like 3s 4s 3s 4s 3s 10s 4s 4s 3s 4s 10s 3s 10s.... (Usually 4s, but 10s sometimes)
In both cases, the times are not consistent but more like, a bimodal distribution? It really made me unsure whether my correction of the code is really making it faster.
So I would like to know what will cause such result.
Debug mode?
The "C# hve to compile/interpret the code on the 1st time it runs, but the following times will be faster" thing?
The waiting of WndProc message?
The reflections? PropertyInfo? Reflection.Assembly?
Loading files? XML? DLL? resource file?
UI Layouts?
(There are surely no internet/network/database access in that part)
Thanks.

Profiling by stopping in the debugger is not a reliable way to get timings, as you've discovered.
Profiling by writing times to a log works fine, although why do all this by hand when you can just launch the program in dotTrace? (Free trial, fully functional).
Another thing that works when you don't have access to a profiler is what I call the binary approach - look at what happens in the code and try to disable about half of it by using comments. Note the effect on the running time. If it appears significant, repeat this process with half of that half, and so on recursively until you narrow in on the most significant piece of work. The difficulty is in simulating the side effects of the missing code so that that the remaining code can still work, so this is still harder than using a debugger, but can be quicker than adding a lot of manually time logging, because the binary approach lets you zero in on the slowest place in logarithmic time.
Raymond Chen's advise is good here. When people ask him "How can I make my application start up faster?" he says "Do less stuff."
(And ALWAYS profile the release build - profiling the debug build is generally a wasted effort).

Profile it. you can use eqatec its free

Well, the best thing is to run your application through a profiler and see what the bottlenecks are. I've personally used dotTrace, there are plenty of others you can find on the web.
Debug mode turns off a lot of JIT optimizations, so apps will run a lot slower than release builds. Whatever the mode, JITting has to happen, so I'd discount that as a significant factor. Time to read files from disk can vary based on the OS's caching mechanism, and whether you're doing a cold start or a warm start.
If you have to use timers to profile, I'd suggest repeating the experiment a large number of times and taking the average.

Profiling you code is definitely the best way to identify which areas are taking the longest to run.
As for the other part of your question about the inconsistent timings: timings in an multitasking O/S are inherently inconsistent, and working with managed code throws the garbage collector into the equation too. It could be that the GC is kicking in during your timing which will obviously slow things down.
If you want to try and get a "purer" timing try putting a GC collect before you start your timers, this way it is less likely to start in your timing section. Do remember to remove the timers after, as second guessing when the GC should run normally results in poorer performance.

Apart from the obvious (profiling), which will tell you precisely where time is being spent, there are some other points that spring to mind:
To get reasonable timing results with the approach you are using, run a release build of your program, and have it dump the timing results to a file (e.g. with Trace.WriteLine). Timing a debug version will give you spurious results. When running the timing tests, quit all other applications (including your debugger) to minimise the load on your computer and get more consistent results. Run the program many times and look at the average timings. Finally, bear in mind that Windows caches a lot of stuff, so the first run will be slow and subsequent runs will be much faster. This will at least give you a more consistent basis to tell whether your improvements are making a significant difference.
Don't try and optimise code that shouldn't be run in the first place - Can you defer any of the init tasks? You may find that some of the work can simply be removed from the init sequence. e.g. if you are loading a data file, check whether it is needed immediately - if not, then you could load it the first time it is needed instead of during program startup.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.