I have a 2-dimensional byte array that looks something like this:
0 0 0 0 1
1 1 1 1 0
0 0 1 1 1
1 0 1 0 1
Each value in the array can only be 0 or 1. The above simplified example shows 4 rows with each row having 5 columns. I am trying to figure out how to use LINQ to return the index to the row that has the largest number of 1s set, which in the above example should return 1.
The following non LINQ C# code solves the problem:
static int GetMaxIndex(byte[,] TwoDArray)
{
// This method finds the row with the greatest number of 1s set.
//
int NumRows = TwoDArray.GetLength(0);
int NumCols = TwoDArray.GetLength(1);
int RowCount, MaxRowCount = 0, MaxRowIndex = 0;
//
for (int LoopR = 0; LoopR < NumRows; LoopR++)
{
RowCount = 0;
for (int LoopC = 0; LoopC < NumCols; LoopC++)
{
if (TwoDArray[LoopR, LoopC] != 0)
RowCount++;
}
if (RowCount > MaxRowCount)
{
MaxRowCount = RowCount;
MaxRowIndex = LoopR;
}
}
return MaxRowIndex;
}
static void Main()
{
byte[,] Array2D = new byte[4, 5] { { 0, 0, 0, 0, 1 }, { 1, 1, 1, 1, 0 }, { 0, 0, 1, 1, 1 }, { 1, 0, 1, 0, 1 } };
int MaxInd = GetMaxIndex(Array2D);
Console.WriteLine("MaxInd = {0}", MaxInd);
}
So, my questions are:
How can LINQ be used to solve this, and would using LINQ here be less efficient that using the non LINQ code above?
Is it possible to solve this problem with PLINQ? Or, would it be more efficient to use the Task Parallel Library (TPL) directly for the above code and split out the count of the number of 1s in each row to a separate thread, assuming that each row has at least 1,000 columns?
It's hard to work with multidimentional arrays with LINQ, but here's how you could do:
var arr = new [,] { { 0, 0, 0, 0, 1 }, { 1, 1, 1, 1, 0 }, { 0, 0, 1, 1, 1 }, { 1, 0, 1, 0, 1 } };
var data =
Enumerable.Range(0, 4)
.Select(
row =>
new
{
index = row,
count = Enumerable.Range(0, 5).Select(col => arr[row, col]).Count(x => x == 1)
})
.OrderByDescending(x => x.count)
.Select(x => x.index)
.First();
Here is how I would do it. It's the same as others more or less, but without any Enumerable.Range (not that there is anything wrong with those (I use them all the time)...it just makes the code more indented in this case).
This one also includes PLINQ stuff. TPL (async/await) wouldn't be suitable for this because it is computationally bound and TPL is better suited to I/O bound operations. Your code would end up executing sequentially if you used async/await rather than PLINQ. This is because async/await won't go parallel until the thread is released (and it can start the next task...which could then go parallel) and purely synchronous functions (such as CPU stuff) won't every actually await...they'll just run all the way through. Basically, it would finish the first thing in your list before it even started the next thing, making it sequentially executed. PLINQ explicitly starts parallel tasks and doesn't have this issue.
//arry is your 2d byte array (byte[,] arry)
var maxIndex = arry
.Cast<byte>() //cast the entire array into bytes
.AsParallel() //make the transition to PLINQ (remove this to not use it)
.Select((b, i) => new // create indexes
{
value = b,
index = i
})
.GroupBy(g => g.index / arry.GetLength(1)) // group it by rows
.Select((g, i) => new
{
sum = g.Select(g2 => (int)g2.value).Sum(), //sum each row
index = i
})
.OrderByDescending(g => g.sum) //max by sum
.Select(g => g.index) //grab the index
.First(); //this should be the highest index
In terms of efficiency, you would probably get better results with your for loop. The question I would ask is, which is more readable and clear?
1) You can do it with LINQ this way...
private static int GetMaxIndex(byte[,] TwoDArray) {
return Enumerable.Range(0, TwoDArray.GetLength(0))
.Select(
x => new {
Index = x,
Count = Enumerable.Range(0, TwoDArray.GetLength(1)).Count(y => TwoDArray[x, y] == 1)
})
.OrderByDescending(x => x.Count)
.First()
.Index;
}
... you'd have to test it to see if LINQ is faster or slower.
2) It is possible to use PLINQ. Just use ParallelEnumerable.Range for the row index generator
private static int GetMaxIndex2(byte[,] TwoDArray) {
return ParallelEnumerable.Range(0, TwoDArray.GetLength(0))
.Select(
x => new {
Index = x,
Count = Enumerable.Range(0, TwoDArray.GetLength(1)).Count(y => TwoDArray[x, y] == 1)
})
.OrderByDescending(x => x.Count)
.First()
.Index;
}
Looking at the issue, this is really a two part answer for whatever is "more efficient" for your code. The loop presented is already very lean on resources, but could be more clear on the intent.
Based on the size of data being moved around, even at 10x that, PLINQ is going to be more resource intensive, just because of how much work it is to spin up a thread.
1.) Using LINQ can make this method more readable
Most 2d array LINQ queries I've come across convert it into a jagged array (or array of arrays) before searching. Here's a helper method that does that conversion for us, and to help make this guy look cleaner:
public static T[][] GetJagged<T>(this T[,] raw)
{
int lenX = raw.GetLength(0);
int lenY = raw.GetLength(1);
T[][] jagged = new T[lenX][];
for (int x = 0; x < lenX; x++)
{
jagged[x] = new T[lenY];
for (int y = 0; y < lenY; y++)
{
jagged[x][y] = raw[x, y];
}
}
return jagged;
}
Now, all we have left is to query the now 1d array for each member, and return the sum of each member. Here, I use the selector (b => b), essentially saying if there's a byte, select if for the Sum method.
static int GetMaxIndexLINQ(byte[,] TwoDArray)
{
byte[][] jagged = TwoDArray.GetJagged();
IEnumerable<int> rowSums = from bitRows in jagged
select bitRows.Sum((b) => b);
int maxIndex = rowSums.Max();
int MaxRowIndex = Array.IndexOf(rowSums.ToArray(), maxIndex);
return MaxRowIndex;
}
This way comes out very legible and even if the reader is new to coding, it's pretty easy to get the gist of what's happening here.
I'd like to point out that making your code more readable is making it more efficient. Teamwork makes the dream work, and the quicker a teammate can clearly make sense of what is happening in your code, the better for everyone.
2.) Optimizing for performance
As I said before, there isn't a lot happening here that can be made any leaner, any method calls or unnecessary checking will just slow this process down.
That being said, there is a small change to be made for some easy optimization. Because in this instance we are only dealing with 1s and 0s, there is a real benefit where we can use the internal optimizations the compiler makes, to our benefit. Rather than checking if a value is 0 or not, it is actually much faster to just add it in to our running sum!
static int GetMaxIndex_EvenBetter(byte[,] TwoDArray)
{
int NumRows = TwoDArray.GetLength(0);
int NumCols = TwoDArray.GetLength(1);
int RowCount, MaxRowCount = 0, MaxRowIndex = 0;
for (int row = 0; row < NumRows; row++)
{
RowCount = 0;
for (int col = 0; col < NumCols; col++)
{
RowCount += TwoDArray[row, col]; //See my change here
}
if (RowCount > MaxRowCount)
{
MaxRowCount = RowCount;
MaxRowIndex = row;
}
}
return MaxRowIndex;
}
In most other cases you aren't working with just the 1s and 0s, so you DO want to check those values before adding, here however, unnecessary.
// This code is extracted from
// http://www.codeproject.com/Articles/170662/Using-LINQ-and-Extension-Methods-in-C-to-Sort-Vect
private static IEnumerable<T[]> ConvertToSingleDimension<T>(T[,] source)
{
T[] arRow;
for (int row = 0; row < source.GetLength(0); ++row)
{
arRow = new T[source.GetLength(1)];
for (int col = 0; col < source.GetLength(1); ++col)
arRow[col] = source[row, col];
yield return arRow;
}
}
// Convert byte[,] to anonymous type {int index, IEnumerable<byte[]>} for linq operation
var result = (from item in ConvertToSingleDimension(Array2D).Select((i, index) => new {Values = i, Index = index})
orderby item.Values.Sum(i => i) descending, item.Index
select item.Index).FirstOrDefault();
Related
Example:
Say I have
var arr = new int[][] {
{ 1, 9, 4 },
{ 2, 4, 4 },
{ 3, 0, 5 }
};
and say I want the indices of 3. So I want a method that does the equivalent of
Tuple<int,int> indices;
for(int i = 0; i < arr.Length; ++i)
for(int j = 0; j < arr[i].Length; ++j)
if(arr[i][j] == 3)
return new Tuple<int,int>() { i, j }
ideally without having to write any extension methods and ideally in a way that is compact and efficient.
You can do this in LINQ in a "compact" manner - but as the comments suggest, a regular loop will trounce this for efficiency:
var indexes = arr.Select((a, x) => a.Select((v, y) => new { X = x, Y = y })
.Where(z => arr[z.X][z.Y] == 3)).SelectMany(x => x);
Even using LINQ, you still have to traverse the entire collection and build up the indexes (into an anonymous type here) and determine if they meet your criteria (and then flatten the result using SelectMany).
Also note this will return all instances, to get the first occurance simply throw a .First() on the end.
I'd strongly recommend a static helper or extension method in this case:
There are a bunch of answers on how to get a subarray with start and end index or start index and length. But I m looking for a way to get an subarray based on index array.
Here is what I have (which works fine but seems clunky).
//sample arrays (could be unordered)
double[] expiry= { 0.99, 0.9, 0.75, 0.60, 0.5, 0.4, ...};
double[] values = { 0.245, 0.24, 0.235, 0.22, 0.21, 0.20, ... };
//find index of all elements meeting criteria in array expiry
int[] expind = expiry
.Select((b, i) => b == 0.75? i : -1)
.Where(i => i != -1).ToArray();
//create new arrays of appropriate size
double[] newvalues = new double[expind.Length];
//populate new arrays based on index
for (int i = 0; i < expind.Length; i++)
newvalues[i] = values[expind[i]];
//check values
foreach (var item in newvalues)
Console.WriteLine(item);
Is there a more efficient and general way of doing this please?
UPDATE
next attempt (Still not super efficient, but at least loopless):
Array.Sort(expiry, values);
double criteria = 0.75;
int start = Array.IndexOf(expiry, criteria);
int last = Array.LastIndexOf(expiry, criteria);
int length = last - start + 1;
double[] newvalues2 = new double[length];
Array.Copy(values, start, newvalues2, 0, length);
Hi you can find the values in this way using lambda expression:
double[] newVals = values.Where((t, i) => expiry[i] == 0.75).ToArray();
This is a bit more concise. no need to actually put the indexes into an expind array; just use the indices directly with the Where() overload that takes an index:
double[] newpoints = points.Where((p, i) => (expiry[i] == 0.75)).ToArray();
double[] newvalues = values.Where((v, i) => (expiry[i] == 0.75)).ToArray();
See deeper discussion.
Now, if for some reason you already have an array of expind indices, but not the original array of expiry it came from, you can do this:
double[] newpoints = expind.Select(ind => values[ind]).ToArray();
Depending on the circumstances, this might work for you.
private static IEnumerable<double> GetByCondition(List<double> expiry, List<double> value)
{
for(int i = 0; i < expiry.Count; i++)
if(expiry[i] == 0.75)
yield return value[i];
}
Furthermore, I'd put it as a extension method, if frequently used in your arrays/lists.
public static IEnumerable<double> GetValuesByExpiry(
this List<double> self, List<double> values)
{
return GetByCondition(self, values);
}
As #Corak mentioned, the problem might be eliminated all together if you merge those two arrays into a single one consisting of touples. If appropriate in your case, of course. You can probably zip them together.
I have a query which I get as:
var query = Data.Items
.Where(x => criteria.IsMatch(x))
.ToList<Item>();
This works fine.
However now I want to break up this list into x number of lists, for example 3. Each list will therefore contain 1/3 the amount of elements from query.
Can it be done using LINQ?
You can use PLINQ partitioners to break the results into separate enumerables.
var partitioner = Partitioner.Create<Item>(query);
var partitions = partitioner.GetPartitions(3);
You'll need to reference the System.Collections.Concurrent namespace. partitions will be a list of IEnumerable<Item> where each enumerable returns a portion of the query.
I think something like this could work, splitting the list into IGroupings.
const int numberOfGroups = 3;
var groups = query
.Select((item, i) => new { item, i })
.GroupBy(e => e.i % numberOfGroups);
You can use Skip and Take in a simple for to accomplish what you want
var groupSize = (int)Math.Ceiling(query.Count() / 3d);
var result = new List<List<Item>>();
for (var j = 0; j < 3; j++)
result.Add(query.Skip(j * groupSize).Take(groupSize).ToList());
If the order of the elements doesn't matter using an IGrouping as suggested by Daniel Imms is probably the most elegant way (add .Select(gr => gr.Select(e => e.item)) to get an IEnumerable<IEnumerable<T>>).
If however you want to preserve the order you need to know the total number of elements. Otherwise you wouldn't know when to start the next group. You can do this with LINQ but it requires two enumerations: one for counting and another for returning the data (as suggested by Esteban Elverdin).
If enumerating the query is expensive you can avoid the second enumeration by turning the query into a list and then use the GetRange method:
public static IEnumerable<List<T>> SplitList<T>(List<T> list, int numberOfRanges)
{
int sizeOfRanges = list.Count / numberOfRanges;
int remainder = list.Count % numberOfRanges;
int startIndex = 0;
for (int i = 0; i < numberOfRanges; i++)
{
int size = sizeOfRanges + (remainder > 0 ? 1 : 0);
yield return list.GetRange(startIndex, size);
if (remainder > 0)
{
remainder--;
}
startIndex += size;
}
}
static void Main()
{
List<int> list = Enumerable.Range(0, 10).ToList();
IEnumerable<List<int>> result = SplitList(list, 3);
foreach (List<int> values in result)
{
string s = string.Join(", ", values);
Console.WriteLine("{{ {0} }}", s);
}
}
The output is:
{ 0, 1, 2, 3 }
{ 4, 5, 6 }
{ 7, 8, 9 }
You can create an extension method:
public static IList<List<T>> GetChunks<T>(this IList<T> items, int numOfChunks)
{
if (items.Count < numOfChunks)
throw new ArgumentException("The number of elements is lower than the number of chunks");
int div = items.Count / numOfChunks;
int rem = items.Count % numOfChunks;
var listOfLists = new List<T>[numOfChunks];
for (int i = 0; i < numOfChunks; i++)
listOfLists[i] = new List<T>();
int currentGrp = 0;
int currRemainder = rem;
foreach (var el in items)
{
int currentElementsInGrp = listOfLists[currentGrp].Count;
if (currentElementsInGrp == div && currRemainder > 0)
{
currRemainder--;
}
else if (currentElementsInGrp >= div)
{
currentGrp++;
}
listOfLists[currentGrp].Add(el);
}
return listOfLists;
}
then use it like this :
var chunks = query.GetChunks(3);
N.B.
in case of number of elements not divisible by the number of groups, the first groups will be bigger. e.g. [0,1,2,3,4] --> [0,1] - [2,3] - [4]
I'm wondering if anyone knows a better (as in faster) algorithm/solution to solve my problem:
In my program I have an array of uints, from which I want to remove the entries contained in another uint array. However, I cannot use the union of the sets, because I need to keep duplicate values. Badly worded explaination, but the example should make it a bit clearer:
uint[] array_1 = new uint[7] { 1, 1, 1, 2, 3, 4, 4};
uint[] array_2 = new uint[4] { 1, 2, 3, 4 };
uint[] result = array_1 .RemoveRange(array_2);
// result should be: { 1, 1, 4 }
This is my current best idea; but it's fairly slow:
public static uint[] RemoveRange(this uint[] source_array, uint[] entries_to_remove)
{
int current_source_length = source_array.Length;
for (int i = 0; i < entries_to_remove.Length; i++)
{
for (int j = 0; j < current_source_length; j++)
{
if (entries_to_remove[i] == source_array[j])
{
// Shifts the entries in the source_array.
Buffer.BlockCopy(source_array, (j + 1)* 4 , source_array, j * 4, (current_source_length - j) * 4);
current_source_length--;
break;
}
}
}
uint[] new_array = new uint[current_source_length];
Buffer.BlockCopy(source_array, 0, new_array, 0, current_source_length * 4);
return new_array;
}
So again, can someone come up with a more clever approach to achieve what I want?
Thanks!
What about using a Dictionary<uint,int> using the uint number as the key and the number of times the number occurs as the value?
var source = new Dictionary<uint,int>();
source.Add(1,3);
source.Add(2,1);
source.Add(3,1);
source.Add(4,2);
var remove = new uint[]{ 1, 2, 3, 4 };
for (int i = 0; i<remove.Length; i++) {
int occurences;
if (source.TryGet(remove[i], out occurences)) {
if (occurences>1) {
source[remove[i]] = occurences-1;
} else {
source.Remove(remove[i]);
}
}
}
This would do what you want as far as I understand it, they key is reference counting of the number of occurrences and then using the remaining reference count (if > 0) as the number of times a number has to be emitted:
public static uint[] RemoveRange(this uint[] source_array, uint[] entries_to_remove)
{
var referenceCount = new Dictionary<uint, int>();
foreach (uint n in source_array)
{
if (!referenceCount.ContainsKey(n))
referenceCount[n] = 1;
else
referenceCount[n]++;
}
foreach (uint n in entries_to_remove)
{
if (referenceCount.ContainsKey(n))
referenceCount[n]--;
}
return referenceCount.Where(x => x.Value > 0)
.Select(x => Enumerable.Repeat(x.Key, x.Value))
.SelectMany( x => x)
.ToArray();
}
EDIT: This won't help you, since you want to keep duplicates.
I'm leaving it here for people who don't want duplicates.
Create a HashSet<T> from the second list, then call List<T>.RemoveAll with the hashset's Contains method.
var unwanted = new HashSet<uint(...);
list.RemoveAll(unwanted.Contains);
If you don't want to remove them in-place, you can use LINQ:
list.Except(unwanted);
Except will build two hashsets and return items one at a time (deferred execution0
If the arrays aren't sorted, sort them. Initialize 3 indexes to 0. 's'(source) and 'd' (dest) index the big array A, 'r' indexes the "toRemove" array B.
While r<B.length,
While B[r] > A[s], A[d++]= A[s++].
If B[r]==A[s], s++.
r++.
Endwhile.
While s<A.length, A[d++]= A[s++].
A.length = d.
This takes no extra space, and runs in O(N), (or N lg N if they are initially unsorted), compared to the N^2 I your original solution.
You can try using Linq here,
var resultarray = array1.Except(array2);
I have an array of integers and need to find the position in the array of the maximum number along with the minimum. I have it working but it doesn't seem to be a very good way to do it. Can anyone suggest a better way to achieve what I have?
Here's my code:
int[] usageHours = { 3, 3, 5, 4, 0, 0, 2, 2, 4, 25, 158, 320, 212, 356, 401, 460, 480, 403, 298, 213, 102, 87, 34, 45 };
double myAverage = usageHours.Average();
int runningTotal = 0;
int runningMaxPosition = 0;
for (int i = 0; i < usageHours.Length; i++)
{
if (usageHours[i] > runningTotal)
{
runningMaxPosition = i;
runningTotal = usageHours[i];
}
}
txtmax.Text = Convert.ToString(runningMaxPosition)+" With: "+Convert.ToString(runningTotal)+" Users";
txtAv.Text = Convert.ToString(myAverage);
That code is mostly fine. I'd suggest changing the variable names a bit, but that's all. You can work out the minimum in the same loop. I've changed the "if" conditions very slightly to guarantee that they always pick out at least one element (even if all the values are, say, int.MinValue). There are other ways of approaching this, but this is one example. If you have an empty array, you'll end up with max=min=0, and both indexes=-1.
int currentMax = 0;
int currentMaxIndex = -1;
int currentMin = 0;
int currentMinIndex = -1;
for (int i = 0; i < usageHours.Length; i++)
{
if (currentMaxIndex == -1 || usageHours[i] > currentMax)
{
currentMaxIndex = i;
currentMax = usageHours[i];
}
if (currentMinIndex == -1 || usageHours[i] < currentMin)
{
currentMinIndex = i;
currentMin = usageHours[i];
}
}
Here's an alternative using nullable value types to represent "there were no values" answers:
int currentMax? = null;
int currentMaxIndex? = null;
int currentMin? = null;
int currentMinIndex? = null;
for (int i = 0; i < usageHours.Length; i++)
{
if (currentMax == null || usageHours[i] > currentMax.Value)
{
currentMax = i;
currentMax = usageHours[i];
}
if (currentMin == null || usageHours[i] < currentMin.Value)
{
currentMinIndex = i;
currentMin = usageHours[i];
}
}
Don't worry if you haven't come across nullable value types yet though...
The code looks OK for finding the max value. If you are using C# 3 or later you could use the LINQ extension methods (there are Min, Max and Average methods, and on List there is also a FindIndex method, amongst others), but I get the impression that you are learning programming, and then it is sometimes a good idea to implement stuff that may be built into the framework, just for the learning value.
I just wanted to provide one-liner solution for the question (for completeness).
In the OP's original question he only asks for index of the maximum and index of the minimum.
Let's stick to this question. This is the most interesting question because to find maximum value we can simply use Enumerable.Max LINQ method. The same goes for Min and Average.
Let's only provide index of the max, index of min can be retrieved with similar code.
int indexOfMax = Enumerable.Range(0, usageHours.Length).Aggregate(
(indexOfMax, i) => (usageHours[i] > usageHours[indexOfMax] ? i : indexOfMax)
);
Delegate inside of Aggregate's brackets is executed for each index of array. It gets as parameters "index of maximum value so far found", and current index. It returns "index of maximum value so far found". Obviously in each iteration "index of maximum value so far found" will only change to current index if corresponding element of array is greater than previous maximum.
scratch the linq code, it didnt work the way you wanted
you could make your code a little bit more concise
for (int i = 0; i < usageHours.Length; i++)
{
if (usageHours[i] > usageHours[runningMaxPosition])
runningMaxPosition = i;
}
all it does differently is leavs out the temporary runningTotal variable.
How about this:
double average = usageHours.Average();
int maxPosition = Enumerable.Range(0, usageHours.Length).Max(i => usageHours[i]);
int minPosition = Enumerable.Range(0, usageHours.Length).Min(i => usageHours[i]);
Your code isn't bad, but it won't work if all the values are less than zero.
Try this:
int getArrayMaxPosition (double[] theArray)
{
double maxVal = theArray[0];
int ret = 0;
int currentIndex = 0;
foreach (double aValue in theArray)
{
if (aValue > maxVal)
{
ret = currentIndex;
maxVal = avalue;
}
currentIndex++;
}
return ret;
}
As was mentioned on the comment's to Jon's answer, Jon's solution really is the best, most direct, quickest way of doing it.
If, however, you did want to use Igor's solution, here's the rest of it (to get the actual positions as well as the values):
int maxValue = Enumerable.Range(0, usageHours.Length).Max(i => usageHours[i]);
int maxPosition = Array.FindIndex(usageHours, i => i == maxValue);
int minValue = Enumerable.Range(0, usageHours.Length).Min(i => usageHours[i]);
int minPosition = Array.FindIndex(usageHours, i => i == minValue);