Parallel while loop in C# - c#

I try to write BBS-generator. But this generator is too slow. So I need to parallel while. But i did not how do it. Please help me:).
public static List<BigInteger> GenerateFullCycle(long N, long x0)
{
List<BigInteger> randomNumbers = new List<BigInteger>();
var bbs = new BBS_generator(N, x0);
BigInteger rvalue = bbs.Generate();
while(randomNumbers.Contains(rvalue) == false)
{
randomNumbers.Add(rvalue);
rvalue = bbs.Generate();
}
return randomNumbers;
}

You might want to look into improving your algorithm first. How large does the randomNumbers list get? Contains is pretty slow on larger lists. You might want to use a HashSet instead.
BBS is slow, as simple as that. Using BigInteger really doesn't help at all - also quite slow.
Apart from that, there is no way to parallelize this code. The second iteration on the loop depends on the first, the third on the second etc.. You can parallelize the GenerateFullCycle calls themselves, but that's about it.
Parallelization isn't a solve-it-all thing. Many things aren't parallelizable at all - and how much do you get by parallelization? Twice faster? Even if you managed to completely separate and run at 100% efficiency, a 4-core CPU still only means you're four times as fast. True, it does have its applications, but I fail to see how it could help a BBS generator.
And the number one rule of performance tweaking - profile. Where is your application actually spending time?

Maybe you can use something like a Background Worker:
http://msdn.microsoft.com/en-us/library/cc221403(v=vs.95).aspx

Related

Why was the parallel version slower than the sequential version in this example?

I've been learning a little about parallelism in the last few days, and I came across this example.
I put it side to side with a sequential for loop like this:
private static void NoParallelTest()
{
int[] nums = Enumerable.Range(0, 1000000).ToArray();
long total = 0;
var watch = Stopwatch.StartNew();
for (int i = 0; i < nums.Length; i++)
{
total += nums[i];
}
Console.WriteLine("NoParallel");
Console.WriteLine(watch.ElapsedMilliseconds);
Console.WriteLine("The total is {0}", total);
}
I was surprised to see that the NoParallel method finished way way faster than the parallel example given at the site.
I have an i5 PC.
I really thought that the Parallel method would finish faster.
Is there a reasonable explanation for this? Maybe I misunderstood something?
The sequential version was faster because the time spent doing operations on each iteration in your example is very small and there is a fairly significant overhead involved with creating and managing multiple threads.
Parallel programming only increases efficiency when each iteration is sufficiently expensive in terms of processor time.
I think that's because the loop performs a very simple, very fast operation.
In the case of the non-parallel version that's all it does. But the parallel version has to invoke a delegate. Invoking a delegate is quite fast and usually you don't have to worry how often you do that. But in this extreme case, it's what makes the difference. I can easily imagine that invoking a delegate will be, say, ten times slower (or more, I have no idea what the exact ratio is) than adding a number from an array.

Time Complexity Evaluation?

I have implemented cryptography and Steganography algorithms in a application and would like to make a time complexity evaluation for these algorithms. I don't have any idea how these tests should be performed.
Is there any testing framework which I can apply or how to make these tests?
Assuming you're not using recursion, you should look at your loops. It's fairly simple to count by inspection the number of times each loop is iterated based on a certain input. The time complexity is given by multiplication of the number of times nested loops are run. If you have more than one set of nested loops, add them all together and this is the time complexity. As an example:
for (int i=0; i<N; ++i)
{
// Some constant time operation happens here
for (int j=0; j<N; ++j)
{
// Some constant time operation happens here
}
// Some constant time operation happens here
}
This loop will execute in O(N) + O(N2) = O(N + N2) = O(N2) time since the outer constant time portions of the loop are executed in O(N) time, while the inner portion is performed in O(N)*O(N) = O(N2).
If you're using recursion, things are a bit harder to analyze, but it boils down to the same thing ... counting how many times constant time portions of your code get executed. Recommend you read a book on analysis of algorithms (This one is pretty much the standard on the subject) which will help you understand how best to analyze other algorithms as well.
You are manipulating pixels in a bitmap, pretty hard to come up with an algorithm that isn't going to be O(n^2). You could run timing tests with various sizes of bitmaps, use the Stopwatch class. Bitmap manipulations that are not computationally hard, like this, are fundamentally limited by RAM bus bandwidth. And the size of the cpu cache if you don't address the pixels the smart way (storage order). You'll know you have cache problems when the perf suddenly falls off a cliff as you increase the bitmap size.
You could unit test it. Assuming you have a method with parameters you can time how long it takes with different input sizes n. That way you can work out the time complexity with respect to n. This would be the simplest way. Not sure of any frameworks that could automate this for you.

While loop execution time

We were having a performance issue in a C# while loop. The loop was super slow doing only one simple math calc. Turns out that parmIn can be a huge number anywhere from 999999999 to MaxInt. We hadn't anticipated the giant value of parmIn. We have fixed our code using a different methodology.
The loop, coded for simplicity below, did one math calc. I am just curious as to what the actual execution time for a single iteration of a while loop containing one simple math calc is?
int v1=0;
while(v1 < parmIn) {
v1+=parmIn2;
}
There is something else going on here. The following will complete in ~100ms for me. You say that the parmIn can approach MaxInt. If this is true, and the ParmIn2 is > 1, you're not checking to see if your int + the new int will overflow. If ParmIn >= MaxInt - parmIn2, your loop might never complete as it will roll back over to MinInt and continue.
static void Main(string[] args)
{
int i = 0;
int x = int.MaxValue - 50;
int z = 42;
System.Diagnostics.Stopwatch st = new System.Diagnostics.Stopwatch();
st.Start();
while (i < x)
{
i += z;
}
st.Stop();
Console.WriteLine(st.Elapsed.Milliseconds.ToString());
Console.ReadLine();
}
Assuming an optimal compiler, it should be one operation to check the while condition, and one operation to do the addition.
The time, small as it is, to execute just one iteration of the loop shown in your question is ... surprise ... small.
However, it depends on the actual CPU speed and whatnot exactly how small it is.
It should be just a few machine instructions, so not many cycles to pass once through the iteration, but there could be a few cycles to loop back up, especially if branch prediction fails.
In any case, the code as shown either suffers from:
Premature optimization (in that you're asking about timing for it)
Incorrect assumptions. You can probably get a much faster code if parmIn is big by just calculating how many loop iterations you would have to perform, and do a multiplication. (note again that this might be an incorrect assumption, which is why there is only one sure way to find performance issues, measure measure measure)
What is your real question?
It depends on the processor you are using and the calculation it is performing. (For example, even on some modern architectures, an add may take only one clock cycle, but a divide may take many clock cycles. There is a comparison to determine if the loop should continue, which is likely to be around one clock cycle, and then a branch back to the start of the loop, which may take any number of cycles depending on pipeline size and branch prediction)
IMHO the best way to find out more is to put the code you are interested into a very large loop (millions of iterations), time the loop, and divide by the number of iterations - this will give you an idea of how long it takes per iteration of the loop. (on your PC). You can try different operations and learn a bit about how your PC works. I prefer this "hands on" approach (at least to start with) because you can learn so much more from physically trying it than just asking someone else to tell you the answer.
The while loop is couple of instructions and one instruction for the math operation. You're really looking at a minimal execution time for one iteration. it's the sheer number of iterations you're doing that is killing you.
Note that a tight loop like this has implications on other things as well, as it bogs down one CPU and it blocks the UI thread (if it's running on it). Thus, not only it is slow due to the number of operations, it also adds a perceived perf impact due to making the whole machine look unresponsive.
If you're interested in the actual execution time, why not time it for yourself and find out?
int parmIn = 10 * 1000 * 1000; // 10 million
int v1=0;
Stopwatch sw = Stopwatch.StartNew();
while(v1 < parmIn) {
v1+=parmIn2;
}
sw.Stop();
double opsPerSec = (double)parmIn / sw.Elapsed.TotalSeconds;
And, of course, the time for one iteration is 1/opsPerSec.
Whenever someone asks about how fast control structures in any language you know they are trying to optimize the wrong thing. If you find yourself changing all your i++ to ++i or changing all your switch to if...else for speed you are micro-optimizing. And micro optimizations almost never give you the speed you want. Instead, think a bit more about what you are really trying to do and devise a better way to do it.
I'm not sure if the code you posted is really what you intend to do or if it is simply the loop stripped down to what you think is causing the problem. If it is the former then what you are trying to do is find the largest value of a number that is smaller than another number. If this is really what you want then you don't really need a loop:
// assuming v1, parmIn and parmIn2 are integers,
// and you want the largest number (v1) that is
// smaller than parmIn but is a multiple of parmIn2.
// AGAIN, assuming INTEGER MATH:
v1 = (parmIn/parmIn2)*parmIn2;
EDIT: I just realized that the code as originally written gives the smallest number that is a multiple of parmIn2 that is larger than parmIn. So the correct code is:
v1 = ((parmIn/parmIn2)*parmIn2)+parmIn2;
If this is not what you really want then my advise remains the same: think a bit on what you are really trying to do (or ask on Stackoverflow) instead of trying to find out weather while or for is faster. Of course, you won't always find a mathematical solution to the problem. In which case there are other strategies to lower the number of loops taken. Here's one based on your current problem: keep doubling the incrementer until it is too large and then back off until it is just right:
int v1=0;
int incrementer=parmIn2;
// keep doubling the incrementer to
// speed up the loop:
while(v1 < parmIn) {
v1+=incrementer;
incrementer=incrementer*2;
}
// now v1 is too big, back off
// and resume normal loop:
v1-=incrementer;
while(v1 < parmIn) {
v1+=parmIn2;
}
Here's yet another alternative that speeds up the loop:
// First count at 100x speed
while(v1 < parmIn) {
v1+=parmIn2*100;
}
// back off and count at 50x speed
v1-=parmIn2*100;
while(v1 < parmIn) {
v1+=parmIn2*50;
}
// back off and count at 10x speed
v1-=parmIn2*50;
while(v1 < parmIn) {
v1+=parmIn2*10;
}
// back off and count at normal speed
v1-=parmIn2*10;
while(v1 < parmIn) {
v1+=parmIn2;
}
In my experience, especially with graphics programming where you have millions of pixels or polygons to process, speeding up code usually involve adding even more code which translates to more processor instructions instead of trying to find the fewest instructions possible for the task at hand. The trick is to avoid processing what you don't have to.

Is Linq Faster, Slower or the same?

Is this:
Box boxToFind = AllBoxes.FirstOrDefault(box => box.BoxNumber == boxToMatchTo.BagNumber);
Faster or slower than this:
Box boxToFind ;
foreach (Box box in AllBoxes)
{
if (box.BoxNumber == boxToMatchTo.BoxNumber)
{
boxToFind = box;
}
}
Both give me the result I am looking for (boxToFind). This is going to run on a mobile device that I need to be performance conscientious of.
It should be about the same, except that you need to call First (or, to match your code, Last), not Where.
Calling Where will give you a set of matching items (an IEnumerable<Box>); you only want one matching item.
In general, when using LINQ, you need to be aware of deferred execution. In your particular case, it's irrelevant, since you're getting a single item.
The difference is not important unless you've identified that this particular loop as a performance bottleneck through profiling.
If profiling does find it to be a problem, then you'll want to look into alternate storage. Store the data in a dictionary which provides faster lookup than looping through an array.
If micro-optimization is your thing, LINQ performs worse, this is just one article, there are a lot of other posts you can find.
Micro optimization will kill you.
First, finish the whole class, then, if you have performance problems, run a profiler and check for the hotspots of the application.
Make sure you're using the best algorithms you can, then turn to micro optimizations like this.
In case you already did :
Slow -> Fast
LINQ < foreach < for < unsafe for (The last option is not recommended).
Abstractions will make your code slower, 95% of the time.
The fastest is when you are using for loop. But the difference is so small that you are ignore it. It will only matter if you are building a real-time application but then for those applications maybe C# is not the best choice anyway!
If AllBoxes is an IQueryable, it can be faster than the loop, because the queryable could have an optimized implementation of the Where-operation (for example an indexed access).
LINQ is absolutely 100% slower
Depends on what you are trying to accomplish in your program, but for the most part this is most certainly what I would call LAZY PROGRAMMER CODE...
You are going to essentially "stall-out" if you are performing any complex queries, joins etc... total p.o.s for those types of functions/methods- just don't use it. If you do this the hard/long way you will be much happier in the long run...and performance will be a world apart.
NOTE:
I would definitely not recommend LINQ for any program built for speed/synchronization tasks/computation
(i.e. HFT trading &/or AT trading i-0-i for starters).
TESTED:
It took nearly 10 seconds to complete a join in "LINQ" vs. < 1 millisecond.
LINQ vs Loop – A performance test
LINQ: 00:00:04.1052060, avg. 00:00:00.0041052
Loop: 00:00:00.0790965, avg. 00:00:00.0000790
References:
http://ox.no/posts/linq-vs-loop-a-performance-test
http://www.schnieds.com/2009/03/linq-vs-foreach-vs-for-loop-performance.html

Fastest way to calculate primes in C#?

I actually have an answer to my question but it is not parallelized so I am interested in ways to improve the algorithm. Anyway it might be useful as-is for some people.
int Until = 20000000;
BitArray PrimeBits = new BitArray(Until, true);
/*
* Sieve of Eratosthenes
* PrimeBits is a simple BitArray where all bit is an integer
* and we mark composite numbers as false
*/
PrimeBits.Set(0, false); // You don't actually need this, just
PrimeBits.Set(1, false); // remindig you that 2 is the smallest prime
for (int P = 2; P < (int)Math.Sqrt(Until) + 1; P++)
if (PrimeBits.Get(P))
// These are going to be the multiples of P if it is a prime
for (int PMultiply = P * 2; PMultiply < Until; PMultiply += P)
PrimeBits.Set(PMultiply, false);
// We use this to store the actual prime numbers
List<int> Primes = new List<int>();
for (int i = 2; i < Until; i++)
if (PrimeBits.Get(i))
Primes.Add(i);
Maybe I could use multiple BitArrays and BitArray.And() them together?
You might save some time by cross-referencing your bit array with a doubly-linked list, so you can more quickly advance to the next prime.
Also, in eliminating later composites once you hit a new prime p for the first time - the first composite multiple of p remaining will be p*p, since everything before that has already been eliminated. In fact, you only need to multiply p by all the remaining potential primes that are left after it in the list, stopping as soon as your product is out of range (larger than Until).
There are also some good probabilistic algorithms out there, such as the Miller-Rabin test. The wikipedia page is a good introduction.
Parallelisation aside, you don't want to be calculating sqrt(Until) on every iteration. You also can assume multiples of 2, 3 and 5 and only calculate for N%6 in {1,5} or N%30 in {1,7,11,13,17,19,23,29}.
You should be able to parallelize the factoring algorithm quite easily, since the Nth stage only depends on the sqrt(n)th result, so after a while there won't be any conflicts. But that's not a good algorithm, since it requires lots of division.
You should also be able to parallelize the sieve algorithms, if you have writer work packets which are guaranteed to complete before a read. Mostly the writers shouldn't conflict with the reader - at least once you've done a few entries, they should be working at least N above the reader, so you only need a synchronized read fairly occasionally (when N exceeds the last synchronized read value). You shouldn't need to synchronize the bool array across any number of writer threads, since write conflicts don't arise (at worst, more than one thread will write a true to the same place).
The main issue would be to ensure that any worker being waited on to write has completed. In C++ you'd use a compare-and-set to switch to the worker which is being waited for at any point. I'm not a C# wonk so don't know how to do it that language, but the Win32 InterlockedCompareExchange function should be available.
You also might try an actor based approach, since that way you can schedule the actors working with the lowest values, which may be easier to guarantee that you're reading valid parts of the the sieve without having to lock the bus on each increment of N.
Either way, you have to ensure that all workers have got above entry N before you read it, and the cost of doing that is where the trade-off between parallel and serial is made.
Without profiling we cannot tell which bit of the program needs optimizing.
If you were in a large system, then one would use a profiler to find that the prime number generator is the part that needs optimizing.
Profiling a loop with a dozen or so instructions in it is not usually worth while - the overhead of the profiler is significant compared to the loop body, and about the only ways to improve a loop that small is to change the algorithm to do fewer iterations. So IME, once you've eliminated any expensive functions and have a known target of a few lines of simple code, you're better off changing the algorithm and timing an end-to-end run than trying to improve the code by instruction level profiling.
#DrPizza Profiling only really helps improve an implementation, it doesn't reveal opportunities for parallel execution, or suggest better algorithms (unless you've experience to the otherwise, in which case I'd really like to see your profiler).
I've only single core machines at home, but ran a Java equivalent of your BitArray sieve, and a single threaded version of the inversion of the sieve - holding the marking primes in an array, and using a wheel to reduce the search space by a factor of five, then marking a bit array in increments of the wheel using each marking prime. It also reduces storage to O(sqrt(N)) instead of O(N), which helps both in terms of the largest N, paging, and bandwidth.
For medium values of N (1e8 to 1e12), the primes up to sqrt(N) can be found quite quickly, and after that you should be able to parallelise the subsequent search on the CPU quite easily. On my single core machine, the wheel approach finds primes up to 1e9 in 28s, whereas your sieve (after moving the sqrt out of the loop) takes 86s - the improvement is due to the wheel; the inversion means you can handle N larger than 2^32 but makes it slower. Code can be found here. You could parallelise the output of the results from the naive sieve after you go past sqrt(N) too, as the bit array is not modified after that point; but once you are dealing with N large enough for it to matter the array size is too big for ints.
You also should consider a possible change of algorithms.
Consider that it may be cheaper to simply add the elements to your list, as you find them.
Perhaps preallocating space for your list, will make it cheaper to build/populate.
Are you trying to find new primes? This may sound stupid, but you might be able to load up some sort of a data structure with known primes. I am sure someone out there has a list. It might be a much easier problem to find existing numbers that calculate new ones.
You might also look at Microsofts Parallel FX Library for making your existing code multi-threaded to take advantage of multi-core systems. With minimal code changes you can make you for loops multi-threaded.
There's a very good article about the Sieve of Eratosthenes: The Genuine Sieve of Eratosthenes
It's in a functional setting, but most of the opimization do also apply to a procedural implementation in C#.
The two most important optimizations are to start crossing out at P^2 instead of 2*P and to use a wheel for the next prime numbers.
For concurrency, you can process all numbers till P^2 in parallel to P without doing any unnecessary work.
void PrimeNumber(long number)
{
bool IsprimeNumber = true;
long value = Convert.ToInt32(Math.Sqrt(number));
if (number % 2 == 0)
{
IsprimeNumber = false;
MessageBox.Show("No It is not a Prime NUmber");
return;
}
for (long i = 3; i <= value; i=i+2)
{
if (number % i == 0)
{
MessageBox.Show("It is divisible by" + i);
IsprimeNumber = false;
break;
}
}
if (IsprimeNumber)
{
MessageBox.Show("Yes Prime NUmber");
}
else
{
MessageBox.Show("No It is not a Prime NUmber");
}
}

Categories