Efficiently finding arrays with the most occurences of the same number - c#

Let's say I have the following nested array:
[
[1, 2, 3],
[4, 7, 9, 13],
[1, 2],
[2, 3]
[12, 15, 16]
]
I only need the arrays with the most occurrences of the same numbers. In the above example this would be:
[
[1, 2, 3],
[4, 7, 9, 13],
[12, 15, 16]
]
How can I do this efficiently with C#?
EDIT
Indeed my question is really confusing. What I wanted to ask is: How can I eliminate sub-arrays if some bigger sub-array already contains all the elements of a smaller sub-array.
My current implementation of the problem is the following:
var allItems = new List<List<int>>{
new List<int>{1, 2, 3},
new List<int>{4, 7, 9, 13},
new List<int>{1, 2},
new List<int>{2, 3},
new List<int>{12, 15, 16}
};
var itemsToEliminate = new List<List<int>>();
for(var i = 0; i < allItems.ToList().Count; i++){
var current = allItems[i];
var itemsToVerify = allItems.Where(item => item != current).ToList();
foreach(var item in itemsToVerify){
bool containsSameNumbers = item.Intersect(current).Any();
if(containsSameNumbers && item.Count > current.Count){
itemsToEliminate.Add(current);
}
}
}
allItems.RemoveAll(item => itemsToEliminate.Contains(item));
foreach(var item in allItems){
Console.WriteLine(string.Join(", ", item));
}
This does work, but the nested loops for(var i = 0; i < allItems.ToList().Count; i++) and foreach(var item in itemsToVerify) gives it a bad performance. Especially if you know that the allItems array can contain about 10000000 rows.

I would remember the items that are already in the list.
First sort your lists by decreasing length, then check for each item if it's already present.
Given your algorithm, the array is not added if even a single integer is in the list already of known integers already.
Therefore I would use the following algorithm:
List<List<int>> allItems = new List<List<int>>{
new List<int>{1, 2, 3},
new List<int>{4, 7, 9, 13},
new List<int>{1, 2},
new List<int>{2, 3},
new List<int>{12, 15, 16}
};
allItems = allItems.OrderByDescending(x => x.Count()).ToList(); // order by length, decreasing order
List<List<int>> result = new List<List<int>>();
SortedSet<int> knownItems = new SortedSet<int>(); // keep track of numbers, so you don't have to loop arrays
// https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.sortedset-1?view=netframework-4.7.2
foreach (List<int> l in allItems)
{
// bool allUnique = true;
foreach (int elem in l)
{
if (knownItems.Contains(elem))
{
// allUnique = false;
break;
}
else
{
// OK, because duplicates not allowed in single list
// and because how the data is constrained (I still have my doubts about how the data is allowed to look and what special cases may pop up that ruin this, so use with care)
// this WILL cause problems if a list starts with any number which has not yet been provided appears before the first match that would cause the list to be discarded.
knownItems.Add(elem);
}
}
// see comment above near knownItems.Add()
/*
if (allUnique)
{
result.Add(l);
foreach (int elem in l)
{
knownItems.Add(elem);
}
}
*/
}
// output
foreach(List<int> item in result){
Console.WriteLine(string.Join(", ", item));
}
Instead of looping over your original array twice nestedly (O(n^2)), you only do it once (O(n)) and do a search in known numbers (binary search tree lookup: O(n*log2(n))).
Instead of removing from the array, you add to a new one. This uses more memory for the new array. The reordering is done because it is more likely that any subsequent array contains numbers already processed. However sorting a large amount of lists may be slower than the benefit you gain if you have many small lists. If you have even a few long ones, this may pay off.
Sorting your list of lists by the length is valid because
what is to happen if a list has items from different lists? say instead of new List{2, 3} it was new List{2, 4}?
That unexpected behavior. You can see the ints as an id of a person. Each group of ints forms, for example, a family. If the algorithm creates [2, 4], then we are creating, for example, an extramarital relationship. Which is not desirable.
From this I gather the arrays will contain subsets of at most only one other array or be unique. Therefore the Order is irrelevant.
This also assumes that at least one such array would contain all elements of such subsets (and therefore be the longest one and come first.)
The sorting could be removed if it were not so, and should probably be removed if in doubt.
For example:
{1, 2, 3, 4, 5} - contains all elements that future arrays will have subsets of
{1, 4, 5} - must contain no element that {1,2,3,4,5} does not contain
{1, 2, 6} - illegal in this case
{7, 8 ,9} - OK
{8, 9} - OK (will be ignored)
{7, 9} - OK (will be ignored, is only subset in {7,8,9})
{1, 7} - - illegal, but would be legal if {1,2,3,4,5,7,8,9} was in this list. because it is longer it would've been earlier, making this valid to ignore.

Related

Ordered parallel execution

I have an ordered list like [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. I am passing it to a Parallel.ForEach statement. Can I somehow achieve the following ordering of execution of buckets like: process first 3 items [1, 2, 3] where ordering in bucket itself is not mandatory and can be [2, 1, 3] for instance. Then process next 3 items [4, 5, 6], etc?
I'm not sure that you can do this directly. but I would suggest you to divide the input list in to smaller lists and then you can process each sublist with Parallel.Foreach.
List<int> fruits = new List<int>() { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
List<List<int>> ls = new List<List<int>>();
for (int i = 0; i < fruits.Count; i += 3)
{
ls.Add(fruits.GetRange(i, Math.Min(3, fruits.Count - i)));
}
foreach (List<int> group in ls)
{
Parallel.ForEach(group, fruit =>
{
});
}
3 is the length of small list.
Even if the accepted answer fulfills requirements perfectly, there is some overhead in it. First of all, as we are talking about the TPL, the volume of data arrays is probably big, so simply creating that much arrays is very memory consuming.
Also, solution suggested by #viveknuna does not guarantee the order for the chunks. If it is ok, you probably should use the answer from #DmitryBychenko with a small update:
const int chunkSize = 3;
var array = Enumerable.Range(1, 9).ToArray();
// get the chunks for indexes for array sized in group of 3
var partitioner = Partitioner.Create(0, array.Length, chunkSize);
// use all the system resources
var parallelOptions = new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount };
// use the partitioner for a chunks, so outer parallel foreach
// will start a task for all the chunks, [1, 2, 3], [4, 5, 6], [7, 8, 9]
Parallel.ForEach(partitioner, parallelOptions, part =>
{
// inner foreach will handle part of the chunk in parallel
Parallel.ForEach(array.Skip(part.Item1).Take(chunkSize), parallelOptions, value =>
{
// handle the array value in parallel
});
});
In the given code, if you set the for the ParallelOptions.MaxDegreeOfParallelism to the 1, you'll get the desired ordered parallel execution, chunk by chunk.

Efficiently adding multiple elements to the start of a List in C#

I have a list that I would like to add multiple elements to the start of. Adding to the start is linear time because it is backed by an array and has to be moved one at a time, and I cannot afford to do this as many times as I would have to if I implemented this the naive way.
If I know exactly how many elements I am about to add, can I shift them all that much so that the linearity only has to happen once?
List<int> myList = new List<int> { 6, 7, 8, 9, 10 };
//Desired list: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
//Unacceptable
for (int i = 5; i>=0; i--){
myList.Insert(0,i);
}
//Is this concept possible?
int newElements = 5;
for (int i = myList.Count; i>=0; i--){
myList[i+newElements] = myList[i];//This line is illegal, Index was out of range
}
for (int i = 0; i< newElements; i++){
myList[i] = i+1;
}
In this specific instance, access needs to be constant time, hence the usage of List. I need to be able to add elements to both the start and end of the data structure as fast as possible. I am okay with O(m) where m is the number of elements being added (since I don't think that can be avoided) but O(m*n) where n is the number of elements in the existing structure is far too slow.
You can use InsertRange which will be linear if the inserted collection implements ICollection<T>:
var newElements = new[] { 0, 1, 2, 3, 4 };
myList.InsertRange(0, newElements);
myList.InsertRange(0, new List<int> { 1, 2, 3, 4, 5 });
If your new elements are already in a List, you could use List.AddRange to add your "old" list to the end of the to-be-added-items-list.
I would imagine myList.InsertRange(0, newElements) would suit you well. Microsoft will have made that as efficient as it can be.

Checking to see if last 2 numbers in array both exist in 2D array using loops

I have two arrays, one singular and the other 2 dimensional.
int[][] array1 = {
new int [] {1, 22, 3, 44, 5, 66},
new int [] {11, 22, 33, 44, 55, 66},
new int [] {1, 2, 3, 4, 5, 6},
};
int[] array2 = new int[] {1, 2, 3, 5, 66}
I need to create a loop which searches the array1 for both the 2nd last digits in array2, so it would return how many times an array with array1 contains both 5 and 66, which is 1, as the other two only contain 1 of each number.
I already managed to write a function which returns how many times array2 as a whole exists in array1, this new function is effectively a refinement of that.
for (int a = 0; a < array1[i].Length; a++)
{
for (int b = 0; b < array2.Length; b++)
{
if (array2[c] == array1[a][b])
count++;
temp[b] = array1[a][b];
}
}
I feel all would be needed to search for just the last two digits is a slight change to this function, I tried to add in another loop but that didn't work either. How would I go about doing this? I'm using loops and not Contains for a reason since i'm still learning the basics.
The one thing is not clear in question that, does it matter which position the two digits occur in the 2D array?
If ithis is not the case, then you can use Intersect() which produces the set intersection of two sequences by using the default equality comparer to compare values:
var result = array1.Count(x => x.Intersect(array2.Reverse().Take(2)).Count() == 2);
If you have paid an attention we have used this line, for getting last two elements of array1:
array1.Reverse().Take(2);
.NET Fiddle
Additional:
If you want to find if last two elements of arrays in 2D array is equal to last two elements of array1, then you can try LINQ solution:
var result = array1.Count(x=> x.Reverse().Take(2).SequenceEqual(array2.Reverse().Take(2)));
Explanation of used extension methods:
Reverse() inverts the order of the elements in a sequence.
Take() returns a specified number of contiguous elements from the start of a sequence.
SequenceEqual() determines whether two sequences are equal by comparing the elements by using the default equality comparer for their type.
After getting last two elements of both arrays, we will use SequenceEqual() to determine if both arrays are equal.
var res = array1.Where((x) => (x.Contains(array2.Last()) && x.Contains(array2[array2.Length - 2]))).Count();
Explaination:
array1.Where takes every subarray of array1 and filters the ones that
meet a certain condition. The condition being every subarray of
array1 contains the last && next-to-last element of array2.The
Count() methods returns the number of subarrays that meet the
conditions
You can create an array of target values and then count the number of times that the intersection of that array with each subarray in the 2D array contains all the items in the target array:
using System;
using System.Linq;
namespace ConsoleApplication2
{
class Program
{
[STAThread]
private static void Main()
{
int[][] array1 =
{
new [] {1, 22, 3, 44, 5, 66},
new [] {11, 22, 33, 44, 55, 66},
new [] {1, 2, 3, 4, 5, 6},
new [] {1, 66, 3, 4, 5, 6} // This one has the target items out of order.
};
int[] array2 = {1, 2, 3, 5, 66};
// Extract the targets like this; it avoids making a copy
// of array2 which occurs if you use IEnumerable.Reverse().
int[] targets = {array2[array2.Length - 2], array2[array2.Length - 1]};
// Count the number of times that each subarray in array1 includes
// all the items in targets:
int count = array1.Count(array => array.Intersect(targets).Count() == targets.Length);
Console.WriteLine(count);
}
}
}

Best way to construct the algorithm of tree in C#

I have a small problem in finding the most efficient solution. I have a number, for example 10, of student ids. And some of those ids are relative(siblings) to each other. For those who are siblings leave only one of them for identifing, and doesn't matter which, first one is fine.
For example, the student ids
original
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
where 1, 2, 3 are one family siblings, and 8, 9 are another. At the end I should have:
expected
1, 4, 5, 6, 7, 8, 10
I am doing it through loop.
UPDATE:
I just stopped to implement it, because it gets bigger and bigger. This is a big picture of what I have in my mind. I just gathered all sibling ids for each given id row by row, and then I was going to iterate per each. But like I said it's wasting time.
Code (in conceptual)
static string Trimsiblings(string ppl) {
string[] pids=ppl.Split(',');
Stack<string> personid=new Stack<string>();
foreach(string pid in pids) {
// access database and check for all sibling
// is for each individual pid
// such as in example above
// row 1: field 1=1, field2=2, field3=3
// row 2: field 1=8, field2=9
query = Select..where ..id = pid; // this line is pesudo code
for(int i=0; i<query.Length; i++) {
foreach(string pid in pids) {
if(query.field1==pid) {
personid.Push(pid);
}
}
}
}
}
For an efficient code, it essential to notice that the one member (e.g., the first) of each family of siblings is irrelevant because it will stay in the output. That is, we simply have to
Create a list of items that must not appear in the output
Actually remove them
Of course, this only works under the assumption that every sibling actually appears in the original list of ids.
In code:
int[] ids = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
int families = new int[2][] {
new int [] {1, 2, 3},
new int [] {8, 9}
};
var itemsToOmit = siblings.
Select(family => family.Skip(1)).
Aggregate((family1, family2) => family1.Concat(family2));
var cleanedIds = ids.Except(itemsToOmit);
Edit: Since you mention that you are not too familiar with the syntax, I will give some further explanations
The expressions I've used are extension methods that are part of the System.LINQ namespace
The Select method transforms one sequence into another sequence. Since families is sequence of sequences, family will be a sequence of siblings in the same family (i.e., 1, 2, 3 and 8, 9 in this particular case)
The Skip method skips a number of elements of a sequence. Here, I've decided to always skip the first element (for reasons, see above)
The Aggregate method combines element of a sequence into a single element. Here, all families of siblings are just concatenated to each other (except for the first sibling of each family which has been omitted via Skip)
The Except method returns all elements of a sequence that are not in the sequence that is given as an argument.
I hope this clarifies things a bit.
Here's how
public static String Trimsiblings(String ppl) {
var table=GetSiblingTable();
var pids=ppl.Split(',');
return
String.Join(", ", (
from id in pids.Select(x => int.Parse(x))
where (
from row in table.AsEnumerable()
select
from DataColumn column in table.Columns
let data=row[column.ColumnName]
where DBNull.Value!=data
select int.Parse((String)data)
).All(x => false==x.Contains(id)||x.First()==id)
select id.ToString()).ToArray()
);
}
// emulation of getting table from database
public static DataTable GetSiblingTable() {
var dt=new DataTable();
// define field1, ..., fieldn
for(int n=3, i=1+n; i-->1; dt.Columns.Add("field"+i))
;
dt.Rows.Add(new object[] { 1, 2, 3 });
dt.Rows.Add(new object[] { 8, 9 });
return dt;
}
public static void TestMethod() {
Console.WriteLine("{0}", Trimsiblings("1, 2, 3, 4, 5, 6, 7, 8, 9, 10"));
}
comment to request why(if you need).

C# Calculate items in List<int> values vertically

I have a list of int values some thing like below (upper bound and lower bounds are dynamic)
1, 2, 3
4, 6, 0
5, 7, 1
I want to calculate the column values in vertical wise like
1 + 4 + 5 = 10
2 + 6 + 7 = 15
3 + 0 + 1 = 4
Expected Result = 10,15,4
Any help would be appreciated
Thanks
Deepu
Here's the input data using array literals, but the subsequent code works exactly the same on arrays or lists.
var grid = new []
{
new [] {1, 2, 3},
new [] {4, 6, 0},
new [] {5, 7, 1},
};
Now produce a sequence with one item for each column (take the number of elements in the shortest row), in which the value of the item is the sum of the row[column] value:
var totals = Enumerable.Range(0, grid.Min(row => row.Count()))
.Select(column => grid.Sum(row => row[column]));
Print that:
foreach (var total in totals)
Console.WriteLine(total);
If you use a 2D array you can just sum the first, second,... column of each row.
If you use a 1D array you can simply use a modulo:
int[] results = new results[colCount];
for(int i=0, i<list.Count; list++)
{
results[i%colCount] += list[i];
}
Do you have to use a "List"-object? Elseway, I would use a twodimensional array.
Otherwise, you simply could try, how to reach rows and columns separatly, so you can add the numbers within a simply for-loop. It depends on the methods of the List-object.
Quite inflexible based on the question, but how about:
int ans = 0;
for(int i = 0; i < list.length; i+=3)
{
ans+= list[i];
}
You could either run the same thing 3 times with a different initial iterator value, or put the whole thing in another loop with startValue as an interator that runs 3 times.
Having said this, you may want to a) look at a different way of storing your data if, indeed they are in a single list b) look at more flexible ways to to this or wrap in to a function which allows you to take in to account different column numbers etc...
Cheers,
Adam

Categories