Speed up PathGeometry.Combine (c#.net)? - c#

I've got two geometrical data sets to match, both containing tens of thousands PathGeometries. To be exact I need to find areas which overlap from one set to the other, so I got a loop like
foreach (var p1 in firstGeometries)
{
foreach (var p2 in secondGeometries)
{
PathGeometry sharedArea = PathGeometry.Combine(p1, p2, GeometryCombineMode.Intersect, null);
if (sharedArea.GetArea() > 0) // only true 0.01% of the time
{
[...]
}
}
}
Now, due to the nature of my data, 99,99% of the times the combinations do not intersect at all. Profiling told me this is the most 'expensive' part of this calculation.
Is there any way to speed up or get a faster collision detection between two PathGeometries?

Adding a new answer since I'm more familiar with the Geometry class now. First, I'd test for intersection using their bounding boxes. Though honestly, PathGeometry.Combine probably already does this. So what's the real problem? That testing the boundary of each object against the boundary of every other object is quadratic time. If you instead found intersections (or collisions in some areas of CS) using a quadtree, you could have significant performance gains. Hard to say without testing and fine tuning though. http://gamedev.tutsplus.com/tutorials/implementation/quick-tip-use-quadtrees-to-detect-likely-collisions-in-2d-space/

Maybe you can use the Parallel.ForEach method, if you have more than one cpu core avaiable.

Though I am not sure about the exact nature of each of the path geometry, but assuming that they are polygons:
You can sort each object based on their bounds. This way, you are assured that, once the condition if (sharedArea.GetArea() > 0) fails, the remaining elements in the inner loop will not produce an area greater than 0, so you can break out of the loop.
It will significantly improve the running time, since the condition is likely to fail most of the time.

I haven't tested it, but it may be helpful to use GetFlattenedPathGeometry and combine the results of that instead. Depending on the type of geometry your combining, it's likely getting converted to a polygonal approximation each time. Using GetFlattenedPathGeometry ahead of time will hopefully eliminate the redundant computation.

You definitely need a "broad and narrow phase" to do this.
Bounding-Box checks are a must for something like this.
A much simpler alternative to a quad tree would be to use "spatial hashing" (sometimes also called "spatial indexing"). This technique should reduce the needed time a thousandfold. For a reference use: http://www.playchilla.com/as3-spatial-hash It's in AS3 but it's trivial to convert it to C#

Related

Looping through two collections - performance and optimization possibilities

This is probably a very common problem which has a lot of answers. I was not able to get to an answer because I am not very sure how to search for it.
I have two collections of objects - both come from the database, and in some cases those collections are of the same object type. Further, I need to do some operations for every combination of those collections. So, for example:
foreach(var a in collection1){
foreach(var b in collection2){
if(a.Name == b.Name && a.Value != b.Value)
//do something with this combination
else
//do something else
}
}
This is very inefficient and it gets slower based on the number of objects in both collections.
What is the best way to solve this type of problems?
EDIT:
I am using .NET 4 at the moment so I am also interested in suggestions using Parallelism to speed that up.
EDIT 2:
I have added above an example of the business rules that need to be performed on each combination of objects. However, the business rules defined in the example can vary.
EDIT 3:
For example, inside the loop the following will be done:
If the business rules are satisfied (see above) a record will be created in the database with a reference to object A and object B. This is one of the operations that I need to do. (Operations will be configurable from child classes using this class).
If you really have to to process every item in list b for each item in list a, then it's going to take time proportional to a.Count * b.Count. There's nothing you can do to prevent it. Adding parallel processing will give you a linear speedup, but that's not going to make a dent in the processing time if the lists are even moderately large.
How large are these lists? Do you really have to check every combination of a and b? Can you give us some more information about the problem you're trying to solve? I suspect that there's a way to bring a more efficient algorithm to bear, which would reduce your processing time by orders of magnitude.
Edit after more info posted
I know that the example you posted is just an example, but it shows that you can find a better algorithm for at least some of your cases. In this particular example, you could sort a and b by name, and then do a straight merge. Or, you could sort b into an array or list, and use binary search to look up the names. Either of those two options would perform much better than your nested loops. So much better, in fact, that you probably wouldn't need to bother with parallelizing things.
Look at the numbers. If your a has 4,000 items in it and b has 100,000 items in it, your nested loop will do 400 million comparisons (a.Count * b.Count). But sorting is only n log n, and the merge is linear. So sorting and then merging will be approximately (a.Count * 12) + (b.Count * 17) + a.Count + b.Count, or in the neighborhood of 2 million comparisons. So that's approximately 200 times faster.
Compare that to what you can do with parallel processing: only a linear speedup. If you have four cores and you get a pure linear speedup, you'll only cut your time by a factor of four. The better algorithm cut the time by a factor of 200, with a single thread.
You just need to find better algorithms.
LINQ might also provide a good solution. I'm not an expert with LINQ, but it seems like it should be able to make quick work of something like this.
If you need to check all the variants one by one you can't do anything better. BUT you can parallel the loops. For ex if you are using c# 4.0 you can use parallel foreach loop.
You can find an example here... http://msdn.microsoft.com/en-us/library/dd460720.aspx
foreach(var a in collection1){
Parallel.ForEach(collection2, b =>
{
//do something with a and b
} //close lambda expression
);
}
In the same way you can parallel the first loop as well.
First of all, there is a reason you are searching with a value from the first collection in the second collection.
For example if you want to know that a value excites in the the second collection, you should put the second collection in a hashset, this will allow you to do a fast lookup. Creating the HashSet and accessing it is like 1 vs n for looping the collection.
Parallel.ForEach(a, currentA => Parallel.ForEach(b, currentB =>
{
// do something with currentA and currentB
}));

C# - Swapping objects without a placeholder

I was told that using a temp object was not the most effective way to swap elements in an array.
Such as:
Object[] objects = new Object[10];
// -- Assign the 10 objects some values
var Temp = objects[2];
objects[2] = objects[4];
objects[4] = Temp;
Is it really possible to swap the elements of the array without using another object?
I know that with math units you can but I cannot figure out how this would be done with any other object type.
Swapping objects with a temporary is the most correct way of doing it. That should rank way higher up in your priorities than speed. It's pretty easy to write fast software that ouputs garbage.
When dealing with objects, you just cannot do it differently. And this is not at all inefficient. An extra reference variable that points to an already existing object is hardly going to be a problem.
But even with numerical values, most clever techniques fail to produce correct results at some point.
It's possible the person who told you this was thinking of something like this:
objects[2] = Interlocked.Exchange(ref objects[4], objects[2]);
Of course, just because this is one line doesn't mean it isn't also using a temporary variable. It's just hidden in the form of a method parameter (the reference to objects[2] is copied and passed to the Exchange method), making it less obvious.
you could do it with Interlocked.Exchange but that won't be faster than using a temp variable... not that speed is likely to matter for this sort problem outside of an interview.
Every example I can find online uses exactly this method to swap 2 elements in an array. In fact, if this were something I were doing often I would definately consider a generic extansion method, akin to this example:
http://w3mentor.com/learn/asp-dot-net-c-sharp/c-collections-and-generics/generically-swapping-two-elements-in-array-using-ccsharp/
The only time you should worry about creating a temporary object is when its massive, and the time taken to copy the object will be sufficiently long and if you're doing it a lot, for example, or if youre doing a sort of 10k items, and moving it from 9999 to 1, 1 stage at a time testing if it should move or not, then swapping it each and every time would be a drain. However, a more efficient way would test all the tests and move once.
Swapping via deconstruction:
(first, second) = (second, first);

Is Linq Faster, Slower or the same?

Is this:
Box boxToFind = AllBoxes.FirstOrDefault(box => box.BoxNumber == boxToMatchTo.BagNumber);
Faster or slower than this:
Box boxToFind ;
foreach (Box box in AllBoxes)
{
if (box.BoxNumber == boxToMatchTo.BoxNumber)
{
boxToFind = box;
}
}
Both give me the result I am looking for (boxToFind). This is going to run on a mobile device that I need to be performance conscientious of.
It should be about the same, except that you need to call First (or, to match your code, Last), not Where.
Calling Where will give you a set of matching items (an IEnumerable<Box>); you only want one matching item.
In general, when using LINQ, you need to be aware of deferred execution. In your particular case, it's irrelevant, since you're getting a single item.
The difference is not important unless you've identified that this particular loop as a performance bottleneck through profiling.
If profiling does find it to be a problem, then you'll want to look into alternate storage. Store the data in a dictionary which provides faster lookup than looping through an array.
If micro-optimization is your thing, LINQ performs worse, this is just one article, there are a lot of other posts you can find.
Micro optimization will kill you.
First, finish the whole class, then, if you have performance problems, run a profiler and check for the hotspots of the application.
Make sure you're using the best algorithms you can, then turn to micro optimizations like this.
In case you already did :
Slow -> Fast
LINQ < foreach < for < unsafe for (The last option is not recommended).
Abstractions will make your code slower, 95% of the time.
The fastest is when you are using for loop. But the difference is so small that you are ignore it. It will only matter if you are building a real-time application but then for those applications maybe C# is not the best choice anyway!
If AllBoxes is an IQueryable, it can be faster than the loop, because the queryable could have an optimized implementation of the Where-operation (for example an indexed access).
LINQ is absolutely 100% slower
Depends on what you are trying to accomplish in your program, but for the most part this is most certainly what I would call LAZY PROGRAMMER CODE...
You are going to essentially "stall-out" if you are performing any complex queries, joins etc... total p.o.s for those types of functions/methods- just don't use it. If you do this the hard/long way you will be much happier in the long run...and performance will be a world apart.
NOTE:
I would definitely not recommend LINQ for any program built for speed/synchronization tasks/computation
(i.e. HFT trading &/or AT trading i-0-i for starters).
TESTED:
It took nearly 10 seconds to complete a join in "LINQ" vs. < 1 millisecond.
LINQ vs Loop – A performance test
LINQ: 00:00:04.1052060, avg. 00:00:00.0041052
Loop: 00:00:00.0790965, avg. 00:00:00.0000790
References:
http://ox.no/posts/linq-vs-loop-a-performance-test
http://www.schnieds.com/2009/03/linq-vs-foreach-vs-for-loop-performance.html

Algorithm for generating an array of non-equal costs for a transport problem optimization

I have an optimizer that solves a transportation problem, using a cost matrix of all the possible paths.
The optimiser works fine, but if two of the costs are equal, the solution contains one more path that the minimum number of paths. (Think of it as load balancing routers; if two routes are same cost, you'll use them both.)
I would like the minimum number of routes, and to do that I need a cost matrix that doesn't have two costs that are equal within a certain tolerance.
At the moment, I'm passing the cost matrix through a baking function which tests every entry for equality to each of the other entries, and moves it a fixed percentage if it matches. However, this approach seems to require N^2 comparisons, and if the starting values are all the same, the last cost will be r^N bigger. (r is the arbitrary fixed percentage). Also there is the problem that by multiplying by the percentage, you end up on top of another value. So the problem seems to have an element of recursion, or at least repeated checking, which bloats the code.
The current implementation is basically not very good (I won't paste my GOTO-using code here for you all to mock), and I'd like to improve it. Is there a name for what I'm after, and is there a standard implementation?
Example:
{1,1,2,3,4,5} (tol = 0.05) becomes {1,1.05,2,3,4,5}
instead of comparing all the values to each other try a more linear approach:
danger! pseudo-code ahead...
seen = {}
for i=0:
for j=0:
if cost_matrix[i,j] in seen:
cost_matrix[i,j] = cost_matrix[i,j]+percentage
append cost_matrix[i,j] to seen
j++
i++
Not to be too picky, but I think you're describing a shortest path problem.
The "Transportation Problem" in OR is a much more specific problem with one set of origins and one set of destinations. In your problem, you have paths through several points, but sometimes you get two shortest paths because the costs add up to the same total. Right?
Here's a good paper on dealing with redundancy in all pairs shortest path problems.

Minimization of f(x,y) where x and y are integers

I was wondering if anyone had any suggestions for minimizing a function, f(x,y), where x and y are integers. I have researched lots of minimization and optimization techniques, like BFGS and others out of GSL, and things out of Numerical Recipes. So far, I have tried implenting a couple of different schemes. The first works by picking the direction of largest descent f(x+1,y),f(x-1,y),f(x,y+1),f(x,y-1), and follow that direction with line minimization. I have also tried using a downhill simplex (Nelder-Mead) method. Both methods get stuck far away from a minimum. They both appear to work on simpler functions, like finding the minimum of a paraboloid, but I think that both, and especially the former, are designed for functions where x and y are real-valued (doubles). One more problem is that I need to call f(x,y) as few times as possible. It talks to external hardware, and takes a couple of seconds for each call. Any ideas for this would be greatly appreciated.
Here's an example of the error function. Sorry I didn't post this before. This function takes a couple of seconds to evaluate. Also, the information we query from the device does not add to the error if it is below our desired value, only if it is above
double Error(x,y)
{
SetDeviceParams(x,y);
double a = QueryParamA();
double b = QueryParamB();
double c = QueryParamC();
double _fReturnable = 0;
if(a>=A_desired)
{
_fReturnable+=(A_desired-a)*(A_desired-a);
}
if(b>=B_desired)
{
_fReturnable+=(B_desired-b)*(B_desired-b);
}
if(c>=C_desired)
{
_fReturnable+=(C_desired-c)*(C_desired-c);
}
return Math.sqrt(_fReturnable)
}
There are many, many solutions here. In fact, there are entire books and academic disciplines based on the subject. I am reading an excellent one right now: How to Solve It: Modern Heuristics.
There is no one solution that is correct - different solutions have different advantages based on specific knowledge of your function. It has even been proven that there is no one heuristic that performs the best at all optimization tasks.
If you know that your function is quadratic, you can use Newton-Gauss to find the minimum in one step. A genetic algorithm can be a great general-purpose tool, or you can try simulated annealing, which is less complicated.
Have you looked at genetic algorithms? They are very, very good at finding minimums and maximums, while avoiding local minimum/maximums.
How do you define f(x,y) ? Minimisation is a hard problem, depending on the complexity of your function.
Genetic Algorithms could be a good candidate.
Resources:
Genetic Algorithms in Search, Optimization, and Machine Learning
Implementing a Genetic Algorithms in C#
Simple C# GA
If it's an arbitrary function, there's no neat way of doing this.
Suppose we have a function defined as:
f(x, y) = 0 for x==100, y==100
100 otherwise
How could any algorithm realistically find (100, 100) as the minimum? It could be any possible combination of values.
Do you know anything about the function you're testing?
What you are generally looking for is called an optimisation technique in mathematics. In general, they apply to real-valued functions, but many can be adapted for integral-valued functions.
In particular, I would recommend looking into non-linear programming and gradient descent. Both would seem quite suitable for your application.
If you could perhaps provide any more details, I might be able to suggest somethign a little more specific.
Jon Skeet's answer is correct. You really do need information about f and it's derivatives even if f is everywhere continuous.
The easiest way to appreciate the difficulties of what you ask(minimization of f at integer values only) is just to think about an f: R->R (f is a real valued function of the reals) of one variable that makes large excursions between individual integers. You can easily construct such a function so that there is NO correllation between the local minimums on the real line and the minimums at the integers as well as having no relationship to the first derivative.
For an arbitrary function I see no way except brute force.
So let's look at your problem in math-speak. This is all assuming I understand
your problem fully. Feel free to correct me if I am mistaken.
we want to minimize the following:
\sqrt((a-a_desired)^2 + (b-b_desired)^2 + (c-c_desired)^2)
or in other notation
||Pos(x - x_desired)||_2
where x = (a,b,c) and Pos(y) = max(y, 0) means we want the "positive part"(this accounts
for your if statements). Finally, we wish to restrict ourself
to solutions where x is integer valued.
Unlike the above posters, I don't think genetic algorithms are what you want at all.
In fact, I think the solution is much easier (assuming I am understanding your problem).
1) Run any optimization routine on the function above. THis will give you
the solution x^* = (a^*, b^*,c^*). As this function is increasing with respect
to the variables, the best integer solution you can hope for is
(ceil(a^*),ceil(b^*),ceil(c^*)).
Now you say that your function is possibly hard to evaluate. There exist tools
for this which are not based on heuristics. The go under the name Derivative-Free
Optimization. People use these tools to optimize objective based on simulations (I have
even heard of a case where the objective function is based on crop crowing yields!)
Each of these methods have different properties, but in general they attempt to
minimize not only the objective, but the number of objective function evaluations.
Sorry the formatting was so bad previously. Here's an example of the error function
double Error(x,y)
{
SetDeviceParams(x,y);
double a = QueryParamA();
double b = QueryParamB();
double c = QueryParamC();
double _fReturnable = 0;
if(a>=A_desired)
{
_fReturnable+=(A_desired-a)*(A_desired-a);
}
if(b>=B_desired)
{
_fReturnable+=(B_desired-b)*(B_desired-b);
}
if(c>=C_desired)
{
_fReturnable+=(C_desired-c)*(C_desired-c);
}
return Math.sqrt(_fReturnable)
}

Categories