Ordered parallel execution - c#

I have an ordered list like [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. I am passing it to a Parallel.ForEach statement. Can I somehow achieve the following ordering of execution of buckets like: process first 3 items [1, 2, 3] where ordering in bucket itself is not mandatory and can be [2, 1, 3] for instance. Then process next 3 items [4, 5, 6], etc?

I'm not sure that you can do this directly. but I would suggest you to divide the input list in to smaller lists and then you can process each sublist with Parallel.Foreach.
List<int> fruits = new List<int>() { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
List<List<int>> ls = new List<List<int>>();
for (int i = 0; i < fruits.Count; i += 3)
{
ls.Add(fruits.GetRange(i, Math.Min(3, fruits.Count - i)));
}
foreach (List<int> group in ls)
{
Parallel.ForEach(group, fruit =>
{
});
}
3 is the length of small list.

Even if the accepted answer fulfills requirements perfectly, there is some overhead in it. First of all, as we are talking about the TPL, the volume of data arrays is probably big, so simply creating that much arrays is very memory consuming.
Also, solution suggested by #viveknuna does not guarantee the order for the chunks. If it is ok, you probably should use the answer from #DmitryBychenko with a small update:
const int chunkSize = 3;
var array = Enumerable.Range(1, 9).ToArray();
// get the chunks for indexes for array sized in group of 3
var partitioner = Partitioner.Create(0, array.Length, chunkSize);
// use all the system resources
var parallelOptions = new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount };
// use the partitioner for a chunks, so outer parallel foreach
// will start a task for all the chunks, [1, 2, 3], [4, 5, 6], [7, 8, 9]
Parallel.ForEach(partitioner, parallelOptions, part =>
{
// inner foreach will handle part of the chunk in parallel
Parallel.ForEach(array.Skip(part.Item1).Take(chunkSize), parallelOptions, value =>
{
// handle the array value in parallel
});
});
In the given code, if you set the for the ParallelOptions.MaxDegreeOfParallelism to the 1, you'll get the desired ordered parallel execution, chunk by chunk.

Related

Efficiently finding arrays with the most occurences of the same number

Let's say I have the following nested array:
[
[1, 2, 3],
[4, 7, 9, 13],
[1, 2],
[2, 3]
[12, 15, 16]
]
I only need the arrays with the most occurrences of the same numbers. In the above example this would be:
[
[1, 2, 3],
[4, 7, 9, 13],
[12, 15, 16]
]
How can I do this efficiently with C#?
EDIT
Indeed my question is really confusing. What I wanted to ask is: How can I eliminate sub-arrays if some bigger sub-array already contains all the elements of a smaller sub-array.
My current implementation of the problem is the following:
var allItems = new List<List<int>>{
new List<int>{1, 2, 3},
new List<int>{4, 7, 9, 13},
new List<int>{1, 2},
new List<int>{2, 3},
new List<int>{12, 15, 16}
};
var itemsToEliminate = new List<List<int>>();
for(var i = 0; i < allItems.ToList().Count; i++){
var current = allItems[i];
var itemsToVerify = allItems.Where(item => item != current).ToList();
foreach(var item in itemsToVerify){
bool containsSameNumbers = item.Intersect(current).Any();
if(containsSameNumbers && item.Count > current.Count){
itemsToEliminate.Add(current);
}
}
}
allItems.RemoveAll(item => itemsToEliminate.Contains(item));
foreach(var item in allItems){
Console.WriteLine(string.Join(", ", item));
}
This does work, but the nested loops for(var i = 0; i < allItems.ToList().Count; i++) and foreach(var item in itemsToVerify) gives it a bad performance. Especially if you know that the allItems array can contain about 10000000 rows.
I would remember the items that are already in the list.
First sort your lists by decreasing length, then check for each item if it's already present.
Given your algorithm, the array is not added if even a single integer is in the list already of known integers already.
Therefore I would use the following algorithm:
List<List<int>> allItems = new List<List<int>>{
new List<int>{1, 2, 3},
new List<int>{4, 7, 9, 13},
new List<int>{1, 2},
new List<int>{2, 3},
new List<int>{12, 15, 16}
};
allItems = allItems.OrderByDescending(x => x.Count()).ToList(); // order by length, decreasing order
List<List<int>> result = new List<List<int>>();
SortedSet<int> knownItems = new SortedSet<int>(); // keep track of numbers, so you don't have to loop arrays
// https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.sortedset-1?view=netframework-4.7.2
foreach (List<int> l in allItems)
{
// bool allUnique = true;
foreach (int elem in l)
{
if (knownItems.Contains(elem))
{
// allUnique = false;
break;
}
else
{
// OK, because duplicates not allowed in single list
// and because how the data is constrained (I still have my doubts about how the data is allowed to look and what special cases may pop up that ruin this, so use with care)
// this WILL cause problems if a list starts with any number which has not yet been provided appears before the first match that would cause the list to be discarded.
knownItems.Add(elem);
}
}
// see comment above near knownItems.Add()
/*
if (allUnique)
{
result.Add(l);
foreach (int elem in l)
{
knownItems.Add(elem);
}
}
*/
}
// output
foreach(List<int> item in result){
Console.WriteLine(string.Join(", ", item));
}
Instead of looping over your original array twice nestedly (O(n^2)), you only do it once (O(n)) and do a search in known numbers (binary search tree lookup: O(n*log2(n))).
Instead of removing from the array, you add to a new one. This uses more memory for the new array. The reordering is done because it is more likely that any subsequent array contains numbers already processed. However sorting a large amount of lists may be slower than the benefit you gain if you have many small lists. If you have even a few long ones, this may pay off.
Sorting your list of lists by the length is valid because
what is to happen if a list has items from different lists? say instead of new List{2, 3} it was new List{2, 4}?
That unexpected behavior. You can see the ints as an id of a person. Each group of ints forms, for example, a family. If the algorithm creates [2, 4], then we are creating, for example, an extramarital relationship. Which is not desirable.
From this I gather the arrays will contain subsets of at most only one other array or be unique. Therefore the Order is irrelevant.
This also assumes that at least one such array would contain all elements of such subsets (and therefore be the longest one and come first.)
The sorting could be removed if it were not so, and should probably be removed if in doubt.
For example:
{1, 2, 3, 4, 5} - contains all elements that future arrays will have subsets of
{1, 4, 5} - must contain no element that {1,2,3,4,5} does not contain
{1, 2, 6} - illegal in this case
{7, 8 ,9} - OK
{8, 9} - OK (will be ignored)
{7, 9} - OK (will be ignored, is only subset in {7,8,9})
{1, 7} - - illegal, but would be legal if {1,2,3,4,5,7,8,9} was in this list. because it is longer it would've been earlier, making this valid to ignore.

How can I make my procedure for finding the Nth most frequent element in an array more efficient and compact?

Here's an example of a solution I came up with
using System;
using System.Linq;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
int[] arr = new int[] { 1, 2, 2, 3, 3, 3, 4, 4, 4, 4 };
var countlist = arr.Aggregate(new Dictionary<int,int>(), (D,i) => {
D[i] = D.ContainsKey(i) ? (D[i] + 1) : 1;
return D;
})
.AsQueryable()
.OrderByDescending(x => x.Value)
.Select(x => x.Key)
.ToList();
// print the element which appears with the second
// highest frequency in arr
Console.WriteLine(countlist[2]); // should print 3
}
}
At the very least, I would like to figure out how to
Cut down the query clauses by at least one. While I don't see any redundancy, this is the type of LINQ query where I fret about all the overhead of all the intermediate structures created.
Figure out how to not return an entire list at the end. I just want the 2nd element in the enumerated sequence; I shouldn't need to return the entire list for the purpose of getting a single element out of it.
int[] arr = new int[] { 1, 2, 2, 3, 3, 3, 4, 4, 4, 4 };
var lookup = arr.ToLookup(t => t);
var result = lookup.OrderByDescending(t => t.Count());
Console.WriteLine(result.ElementAt(1).Key);
I would do this.
int[] arr = new int[] { 1, 2, 2, 3, 3, 3, 4, 4, 4, 4 };
int rank =2;
var item = arr.GroupBy(x=>x) // Group them
.OrderByDescending(x=>x.Count()) // Sort based on number of occurrences
.Skip(rank-1) // Traverse to the position
.FirstOrDefault(); // Take the element
if(item!= null)
{
Console.WriteLine(item.Key);
// output - 3
}
I started to answer, saw the above answers and thought I'd compare them instead.
Here is the Fiddle here.
I put a stopwatch on each and took the number of ticks for each one. The results were:
Orignal: 50600
Berkser: 15970
Tommy: 3413
Hari: 1601
user3185569: 1571
It appears #user3185569 has a slightly faster algorithm than Hari and is about 30-40 times quicker than the OP's origanal version. Note is #user3185569 answer above it appears his is faster when scaled.
update: The numbers I posted above were run on my pc. Using .net fiddle to execute produces different results:
Orignal: 46842
Berkser: 44620
Tommy: 11922
Hari: 13095
user3185569: 16491
Putting the Berkser algortihm slightly faster. I'm not entirely clear why this is the case, as I'm targeting the same .net version.
I came up with the the following mash of Linq and a dictionary as what you're looking for is essentialy an ordered dictionary
void Run()
{
int[] arr = new int[] { 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4 };
int[] unique = arr.Distinct().ToArray();
Dictionary<int, int> dictionary = unique.ToDictionary(k => k, v => 0);
for(int i = 0; i < arr.Length; i++)
{
if(dictionary.ContainsKey(arr[i]))
{
dictionary[arr[i]]++;
}
}
List<KeyValuePair<int, int>> solution = dictionary.ToList();
solution.Sort((x,y)=>-1* x.Value.CompareTo(y.Value));
System.Console.WriteLine(solution[2].Key);
}

Efficiently adding multiple elements to the start of a List in C#

I have a list that I would like to add multiple elements to the start of. Adding to the start is linear time because it is backed by an array and has to be moved one at a time, and I cannot afford to do this as many times as I would have to if I implemented this the naive way.
If I know exactly how many elements I am about to add, can I shift them all that much so that the linearity only has to happen once?
List<int> myList = new List<int> { 6, 7, 8, 9, 10 };
//Desired list: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
//Unacceptable
for (int i = 5; i>=0; i--){
myList.Insert(0,i);
}
//Is this concept possible?
int newElements = 5;
for (int i = myList.Count; i>=0; i--){
myList[i+newElements] = myList[i];//This line is illegal, Index was out of range
}
for (int i = 0; i< newElements; i++){
myList[i] = i+1;
}
In this specific instance, access needs to be constant time, hence the usage of List. I need to be able to add elements to both the start and end of the data structure as fast as possible. I am okay with O(m) where m is the number of elements being added (since I don't think that can be avoided) but O(m*n) where n is the number of elements in the existing structure is far too slow.
You can use InsertRange which will be linear if the inserted collection implements ICollection<T>:
var newElements = new[] { 0, 1, 2, 3, 4 };
myList.InsertRange(0, newElements);
myList.InsertRange(0, new List<int> { 1, 2, 3, 4, 5 });
If your new elements are already in a List, you could use List.AddRange to add your "old" list to the end of the to-be-added-items-list.
I would imagine myList.InsertRange(0, newElements) would suit you well. Microsoft will have made that as efficient as it can be.

Create multiple lists of unique entries from master list in c#

I need to process an outbound SMS queue and create batches of messages. The queued list might contain multiple messages to the same person. Batches do not allow this, so I need to run through the main outbound queue and create as many batches as necessary to ensure they contain unique entries.
Example:
Outbound queue = (1,2,3,3,4,5,6,7,7,7,8,8,8,8,9)
results in...
batch 1 = (1,2,3,4,5,6,7,8,9)
batch 2 = (3,7,8)
batch 3 = (7,8)
batch 4 = (8)
I can easily check for duplicates but I'm looking for a slick way to generate the additional batches.
Thanks!
Have a look at this approach using Enumerable.ToLookup and other LINQ methods:
var queues = new int[] { 1, 2, 3, 3, 4, 5, 6, 7, 7, 8, 8, 8, 8, 9 };
var lookup = queues.ToLookup(i => i);
int maxCount = lookup.Max(g => g.Count());
List<List<int>> allbatches = Enumerable.Range(1, maxCount)
.Select(count => lookup.Where(x => x.Count() >= count).Select(x => x.Key).ToList())
.ToList();
Result is a list which contains four other List<int>:
foreach (List<int> list in allbatches)
Console.WriteLine(string.Join(",", list));
1, 2, 3, 4, 5, 6, 7, 8, 9
3, 7, 8
8
8
Depending on the specific data structures used, the Linq GroupBy extension method could be used (provided that the queue implements IEnumerable<T> for some type T) for grouping by the same user; afterwards, the groups can be iterated separately.
A naive approach would be to walk over the input, creating and filling the batches as you go:
private static List<List<int>> CreateUniqueBatches(List<int> source)
{
var batches = new List<List<int>>();
int currentBatch = 0;
foreach (var i in source)
{
// Find the index for the batch that can contain the number `i`
while (currentBatch < batches.Count && batches[currentBatch].Contains(i))
{
currentBatch++;
}
if (currentBatch == batches.Count)
{
batches.Add(new List<int>());
}
batches[currentBatch].Add(i);
currentBatch = 0;
}
return batches;
}
Output:
1, 2, 3, 4, 5, 6, 7, 8, 9
3, 7, 8
8
8
I'm sure this can be shortened or written in a functional way. I've tried using GroupBy, Distinct and Except, but couldn't figure it out that quickly.

How do I shift items in an array in C#?

Let's say that I have an array of strings like this:
1, 2, 3, 4, 5, 6, 7, 8
and I want to shift the elements of the array such that
The first element always remains fixed
Only the remaining elements get shifted like so ...
The last element in the array becomes the 2nd element and is shifted through the array with each pass.
Pass #1: 1, 2, 3, 4, 5, 6, 7, 8
Pass #2: 1, 8, 2, 3, 4, 5, 6, 7
Pass #3: 1, 7, 8, 2, 3, 4, 5, 6
Pass #4: 1, 6, 7, 8, 2, 3, 4, 5
Any assistance would be greatly appreciated.
Because this looks like homework, I'm posting an unnecessary complex, but very hip LINQ solution:
int[] array = new int[] { 1, 2, 3, 4, 5, 6, 7, 8 };
int[] result = array.Take(1)
.Concat(array.Reverse().Take(1))
.Concat(array.Skip(1).Reverse().Skip(1).Reverse())
.ToArray();
Probably the fastest way to do this in C# is to use Array.Copy. I don't know much about pointers in C# so there's probably a way of doing it that's even faster and avoids the array bounds checks and such but the following should work. It makes several assumptions and doesn't check for errors but you can fix it up.
void Shift<T>(T[] array) {
T last = array[array.Length-1];
Array.Copy(array, 1, array, 2, array.Length-2);
array[1]=last;
}
EDIT
Optionally, there is Buffer.BlockCopy which according to this post performs fewer validations but internally copies the block the same way.
Because this looks like homework, I'm not going to solve it for you, but I have a couple of suggestions:
Remember to not overwrite data if it isn't somewhere else already. You're going to need a temporary variable.
Try traversing the array from the end to the beginning. The problem is probably simpler that way, though it can be done from front-to-back.
Make sure your algorithm works for an arbitrary-length array, not just one that's of size 8, as your example gave.
Although sounds like homework like others suggest, if changing to a List<>, you can get what you want with the following...
List<int> Nums2 = new List<int>();
for( int i = 1; i < 9; i++ )
Nums2.Add(i);
for (int i = 1; i < 10; i++)
{
Nums2.Insert( 1, Nums2[ Nums2.Count -1] );
Nums2.RemoveAt(Nums2.Count -1);
}
Define this:
public static class Extensions
{
public static IEnumerable<T> Rotate<T>(this IEnumerable<T> enuml)
{
var count = enuml.Count();
return enuml
.Skip(count - 1)
.Concat(enuml.Take(count - 1));
}
public static IEnumerable<T> SkipAndRotate<T>(this IEnumerable<T> enuml)
{
return enum
.Take(1)
.Concat(
enuml.Skip(1).Rotate()
);
}
}
Then call it like so:
var array = new [] { 1, 2, 3, 4, 5, 6, 7, 8 };
var pass1 = array.SkipAndRotate().ToArray();
var pass2 = pass1.SkipAndRotate().ToArray();
var pass3 = pass2.SkipAndRotate().ToArray();
var pass4 = pass3.SkipAndRotate().ToArray();
There's some repeated code there that you might want to refactor. And of course, I haven't compiled this so caveat emptor!
This is similar to Josh Einstein's but it will do it manually and will allow you to specify how many elements to preserve at the beginning.
static void ShiftArray<T>(T[] array, int elementsToPreserve)
{
T temp = array[array.Length - 1];
for (int i = array.Length - 1; i > elementsToPreserve; i--)
{
array[i] = array[i - 1];
}
array[elementsToPreserve] = temp;
}
Consumed:
int[] array = { 1, 2, 3, 4, 5, 6, 7, 8 };
ShiftArray(array, 2);
First pass: 1 2 8 3 4 5 6 7

Categories