I need help in creating a custom partitioner for PLINQ that will let me iterate over IEnumerable with length greater than Int32.
This is in reference to this question where the answer given to me was write a custom partitioner:
PLINQ query giving overflow exception
I have tried messing around with these from Dr. Dobbs tutorial but I'm not sure what I need to override to use a long for the index:
http://www.drdobbs.com/windows/custom-parallel-partitioning-with-net-4/224600406?pgno=3
Maybe I'm not going about this in the right way.
I can use Parallel.ForEach all day for any size that the IEnumerable I'm iterating over gets to.
But it's much slower than using PLINQ, which would make sense seeing that every iteration is a set of calculations on a combination of letters/numbers and it's the same calculation for every string combination, and I've read that Parallel.ForEach isn't best suited for this. PLINQ is using chunk partitioning and I believe that's why it's doing the iterations so much faster.
Is there a way I can tune Parallel.ForEach to give me the same speed as PLINQ but at the same time allow me to iterate over sizes greater than 2.14 billion?
ETA: Is there no solution to this, starting to thing after seeing Custom partition examples they have nothing to do with the issue of int to long.
Related
I have an IEnumerable of actions and they are decendent ordered by the time they will consume when executing. Now i want all of them to be executed in parallel. Are there any better solutions than this one?
IEnumerable<WorkItem> workItemsOrderedByTime = myFactory.WorkItems.DecendentOrderedBy(t => t.ExecutionTime);
Parallel.ForEach(workItemsOrderedByTime, t => t.Execute(), Environment.ProcessorCount);
So my idea is to first execute all expensice tasks in terms of time they need to be done.
EDIT: The question is if there is a better solution to get all done in minimum of time.
To solve your XY Problem of
Because otherwise it can happen that 9 of 10 tasks are finished and the last one is executed on 1 core and all other cores are doing nothing.
What you need to do is tell Parallel.ForEach to only take one item from the source list at a time. That way when you are down to the last items you won't have a bunch of slow work items all in a single core's queue.
This can be done by using Partitioner.Create and passing in EnumerablePartitionerOptions.NoBuffering
Parallel.ForEach(Partitioner.Create(workItems, EnumerablePartitionerOptions.NoBuffering),
new ParallelOptions{MaxDegreeOfParallelism = Environment.ProcessorCount},
t => t.Execute());
By default there is no execution order guarantee in Parallel.ForEach
That is why your call to DecendentOrderedBy does not do anything good. Though it might do something bad: in case default partitioner decides to do a range partition dividing say 12 WorkItems into 4 groups of 3 items, by the order in IEnumerable. Then first core has much more work to do, thus creating the problem you try to avoid.
Easy fix to (2) is explained in the answer by Scott. If Parallel.ForEach takes just one item then you naturally get some load balancing. In most cases this will work fine
The optimal (in most cases) solution for an ordered IEnumerable (as you have) will be Striped Partitioning number of buckets = number of cores. AFIK there you don't get this out-of-the-box in .NET. But you can provide a custom OrderablePartitioner that will partition data just this way.
I am sorry to say it but: "No free lunch"
I have a dilemma. I have to implement prioritized queue (custom sort order). I need to insert/process/delete a lot of messages per second by using it (~100-1000).
Which design is faster at run-time?
1) custom sorted by priority collection (list)
2) list(non-sorted collection) + linq query all time when I need to process (dequeue) message
3) something else
ADDED:
SOLUTION:
List (Dictionary) of queues by priority: SortedList<int, VPair<bool, Queue<MyMessage>>>
where int - priority, bool - true if it is not empty queue
Whats your read/write ratio? Are multiple threads involved, if so, how?
As always when asking about performance, benchmark both code-paths and see for yourself (this is especially true the more specific your problem domain is).
The only way to know for sure is to measure the performance for yourself.
Well, finding an element in an unsorted data structure takes O(n) on average (one pass over the data structure). Binary Search trees have an average insertion complexity on O(log n) and also an average lookup complexity of O(log n). So in theory using something like that would be faster. In reality the overhead or the shape of the data might kill the theoretical advantage.
Also if your custom sort order can change at runtime you might have to rebuild the sorted data structure which is an additional performance hit.
In the end: If it is important for your application then try the different approaches and benchmark it yourself - it's the only way to be certain that it works.
Introducing sorting will always incur an insertion performance overhead as far as I'm aware. If there's no need for the sorting then use a nice generic Dictionary which will provide a quick lookup based on your unique key.
When an IEnumerable needs both to be sorted and for elements to be removed, are there advantages/drawback of performing the stages in a particular order? My performance tests appear to indicate that it's irrelevant.
A simplified (and somewhat contrived) example of what I mean is shown below:
public IEnumerable<DataItem> GetDataItems(int maximum, IComparer<DataItem> sortOrder)
{
IEnumerable<DataItem> result = this.GetDataItems();
result.Sort(sortOrder);
result.RemoveAll(item => !item.Display);
result = result.Take(maximum);
return result;
}
If your tests indicate it's irrelevant, than why worry about it? Don't optimize before you need to, only when it becomes a problem. If you find a problem with performance, and have used a profiler, and have found that that method is the hotspot, then you can worry more about it.
On second thought, have you considered using LINQ? Those calls could be replaced with a call to Where and OrderBy, both of which are deferred, and then calling Take, like you have in your example. The LINQ libraries should find the best way of doing this for you, and if your data size expands to the point where it takes a noticeable amount of time to process, you can use PLINQ with a simple call to AsParallel.
You might as well RemoveAll before sorting so that you'll have fewer elements to sort.
I think that Sort() method would usually have complexity of O(n*log(n)), and RemoveAll() just O(n), so in general it is probably better to remove items first.
You'd want something like this:
public IEnumerable<DataItem> GetDataItems(int maximum, IComparer<DataItem> sortOrder)
{
IEnumerable<DataItem> result = this.GetDataItems();
return result
.Where(item => item.Display)
.OrderBy(sortOrder)
.Take(maximum);
}
There are two answers that are correct, but won't teach you anything:
It doesn't matter.
You should probably do RemoveAll first.
The first is correct because you said your performance tests showed it's irrelevant. The second is correct because it will have an effect on larger datasets.
There's a third answer that also isn't very useful: Sometimes it's faster to do removals afterwards.
Again, it doesn't actually tell you anything, but "sometimes" always means there is more to learn.
There's also only so much value in saying "profile first". What if profiling shows that 90% of the time is spent doing x.Foo(), which it does in a loop? Is the problem with Foo(), with the loop or with both? Obviously if we can make both more efficient we should, but how do we reason about that without knowledge outside of what a profiler tells us?
When something happens over multiple items (which is true of both RemoveAll and Sort) there are five things (I'm sure there are more I'm not thinking of now) that will affect the performance impact:
The per-set constant costs (both time and memory). How much it costs to do things like calling the function that we pass a collection to, etc. These are almost always negligible, but there could be some nasty high cost hidden there (often because of a mistake).
The per-item constant costs (both time and memory). How much it costs to do something that we do on some or all of the items. Because this happens multiple times, there can be an appreciable win in improving them.
The number of items. As a rule the more items, the more the performance impact. There are exceptions (next item), but unless those exceptions apply (and we need to consider the next item to know when this is the case), then this will be important.
The complexity of the operation. Again, this is a matter of both time-complexity and memory-complexity, but here the chances that we might choose to improve one at the cost of another. I'll talk about this more below.
The number of simultaneous operations. This can be a big difference between "works on my machine" and "works on the live system". If a super time-efficient approach uses .5GB of memory is tested on a machine with 2GB of memory available, it'll work wonderfully, but when you move it to a machine with 8GB of memory available and have multiple concurrent users, it'll hit a bottleneck at 16 simultaneous operations, and suddenly what was beating other approaches in your performance measurements becomes the application's hotspot.
To talk about complexity a bit more. The time complexity is a measure of how the time taken to do something relates the number of items it is done with, while memory complexity is a measure of how the memory used relates to that same number of items. Obtaining an item from a dictionary is O(1) or constant because it takes the same amount of time however large the dictionary is (not strictly true, strictly it "approaches" O(1), but it's close enough for most thinking). Finding something in an already sorted list can be O(log2 n) or logarithmic. Filtering through a list will be linear or O(n). Sorting something using a quicksort (which is what Sort uses) tends to be linearithmic or O(n log2 n) but in its worse case - against a list already sorted - will be quadratic O(n2).
Considering these, with a set of 8 items, an O(1) operation will take 1k seconds to do something, where k is a constant amount of time, O(log2 n) means 3k seconds, O(n) means 8k, O(n log2 n) means 24k and O(n2) means 64k. These are the most commonly found though there are plenty of others like O(nm) which is affected by two different sizes, or O(n!) which would be 40320k.
Obviously, we want as low a complexity as possible, though since k will be different in each case, sometimes the best solution for a small set has a high complexity (but low k constant) though a lower-complexity case will beat it with larger input.
So. Let's go back to the cases you are considering, viz filtering followed by sorting vs. sorting followed by filtering.
Per-set constants. Since we are moving two operations around but still doing both, this will be the same either way.
Per-item constants. Again, we're still doing the same things per item in either case, so no effect.
Number of items. Filtering reduces the number of items. Therefore the sooner we filter items, the more efficient the rest of the operation. Therefore doing RemoveAll first wins in this regard.
Complexity of the operation. It's either a O(n) followed by a average-case-O(log2 n)-worse-case-O(n2), or it's an average-case-O(log2 n)-worse-case-O(n2) followed by an O(n). Same either way.
Number of simultaneous cases. Total memory pressure will be relieved the sooner we remove some items, (slight win for RemoveAll first).
So, we've got two reasons to consider RemoveAll first as likely to be more efficient and none to consider it likely to be less efficient.
We would not assume that we were 100% guaranteed to be correct here. For a start we could simply have made a mistake in our reasoning. For another, there could be other factors we've dismissed as irrelevant that were actually pertinent. It is still true that we should profile before optimising, but reasoning about the sort of things I've mentioned above will both make us more likely to write performant code in the first place (not the same as optimising; but a matter of picking between options when readability, clarity and correctness is equal either way) and makes it easier to find likely ways to improve those things that profiling has found to be troublesome.
For a slightly different but relevant case, consider if the criteria sorted on matched those removed on. E.g. if we were to sort by date and remove all items after a given date.
In this case, if the list deallocates on all removals, it'll still be O(n), but with a much smaller constant. Alternatively, if it just moved the "last-item" pointer*, it becomes O(1). Finding the pointer is O(log2 n), so here there's both reasons to consider that filtering first will be faster (the reasons given above) and that sorting first will be faster (that removal can be made a much faster operation than it was before). With this sort of case it becomes only possible to tell by extending our profiling. It is also true that the performance will be affected by the type of data sent, so we need to profile with realistic data, rather than artificial test data, and we may even find that what was the more performant choice becomes the less performant choice months later when the dataset it is used on changes. Here the ability to reason becomes even more important, because we should note the possibility that changes in real-world use may make this change in this regard, and know that it is something we need to keep an eye on throughout the project's life.
(*Note, List<T> does not just move a last-item pointer for a RemoveRange that covers the last item, but another collection could.)
It would probably be better to the RemoveAll first, although it would only make much of a difference if your sorting comparison was intensive to calculate.
Is this:
Box boxToFind = AllBoxes.FirstOrDefault(box => box.BoxNumber == boxToMatchTo.BagNumber);
Faster or slower than this:
Box boxToFind ;
foreach (Box box in AllBoxes)
{
if (box.BoxNumber == boxToMatchTo.BoxNumber)
{
boxToFind = box;
}
}
Both give me the result I am looking for (boxToFind). This is going to run on a mobile device that I need to be performance conscientious of.
It should be about the same, except that you need to call First (or, to match your code, Last), not Where.
Calling Where will give you a set of matching items (an IEnumerable<Box>); you only want one matching item.
In general, when using LINQ, you need to be aware of deferred execution. In your particular case, it's irrelevant, since you're getting a single item.
The difference is not important unless you've identified that this particular loop as a performance bottleneck through profiling.
If profiling does find it to be a problem, then you'll want to look into alternate storage. Store the data in a dictionary which provides faster lookup than looping through an array.
If micro-optimization is your thing, LINQ performs worse, this is just one article, there are a lot of other posts you can find.
Micro optimization will kill you.
First, finish the whole class, then, if you have performance problems, run a profiler and check for the hotspots of the application.
Make sure you're using the best algorithms you can, then turn to micro optimizations like this.
In case you already did :
Slow -> Fast
LINQ < foreach < for < unsafe for (The last option is not recommended).
Abstractions will make your code slower, 95% of the time.
The fastest is when you are using for loop. But the difference is so small that you are ignore it. It will only matter if you are building a real-time application but then for those applications maybe C# is not the best choice anyway!
If AllBoxes is an IQueryable, it can be faster than the loop, because the queryable could have an optimized implementation of the Where-operation (for example an indexed access).
LINQ is absolutely 100% slower
Depends on what you are trying to accomplish in your program, but for the most part this is most certainly what I would call LAZY PROGRAMMER CODE...
You are going to essentially "stall-out" if you are performing any complex queries, joins etc... total p.o.s for those types of functions/methods- just don't use it. If you do this the hard/long way you will be much happier in the long run...and performance will be a world apart.
NOTE:
I would definitely not recommend LINQ for any program built for speed/synchronization tasks/computation
(i.e. HFT trading &/or AT trading i-0-i for starters).
TESTED:
It took nearly 10 seconds to complete a join in "LINQ" vs. < 1 millisecond.
LINQ vs Loop – A performance test
LINQ: 00:00:04.1052060, avg. 00:00:00.0041052
Loop: 00:00:00.0790965, avg. 00:00:00.0000790
References:
http://ox.no/posts/linq-vs-loop-a-performance-test
http://www.schnieds.com/2009/03/linq-vs-foreach-vs-for-loop-performance.html
I'm experimenting with the new System.Threading.Parallel methods like parallel for and foreach.
They seem to work nicely but I need a way to increase the number of concurrent threads that are executed which are 8 (I have a Quad core).
I know there is a way I just can find the place thy hidden the damn property.
Gilad.
quote:
var query = from item in source.AsParallel().WithDegreeOfParallelism(10)
where Compute(item) > 42
select item;
In cases where a query is performing a significant amount of non-compute-bound work such as File I/O, it might be beneficial to specify a degree of parallelism greater than the number of cores on the machine.
from: MSDN
IF you are using Parallel.For or Parallel.ForEach you can specify a ParallelOptions object which has a property MaxDegreesOfParallelism. Unfortunately this is just a maximum limit as the name suggests, and does not provide a lower bound guarantee. For the relationsship to WithDegreeOfParallelism see this blog post.
MAY NOT - enough said. Blindy commented it correctly