I have a List of List of int called NN which i would like to write to a csv file like this:
List<List<int>> NN = new List<List<int>>();
The NN list:
1,2,3,4,5,6
2,5,6,3,1,0
0,9,2,6,7,8
And the output csv file should look like this:
1,2,0
2,5,9
3,6,2
4,3,6
5,1,7
6,0,8
What is the best way to achieve that?
If there is a better representation you would recommend instead of the nested list i'll be glad to know.
(The purpose is that each list of int is the weights between the last and next layer in the neural network).
Here is how you can do it:
List<List<int>> NN = new List <List<int>>
{
new List<int> {1,2,3,4,5,6},
new List<int> {2,5,6,3,1,0},
new List<int> {0,9,2,6,7,8}
};
//Get expected number of rows
var numberOfRows = NN[0].Count;
var rows =
Enumerable.Range(0, numberOfRows) //For each row
.Select(row => NN.Select(list => list[row]).ToList()) //Get row data from all columns
.ToList();
StringBuilder sb = new StringBuilder();
foreach (var row in rows)
{
sb.AppendLine(string.Join(",", row));
}
var result = sb.ToString();
What you want to achieve is basically a matrix transpose and then write the data to a file.
What is the most efficient way to transpose a matrix is a complex question, and really depends on your architecture.
If you're not really concerned about super optimizing that for your processor (or accelerator), I would go for a simple nested for loop, accumulating data in intermediate memory representation:
string[] lines = new string[NN[0].Count]; // assume all lines have equal length
for(int i = 0; i < NN.Count; ++i) {
for(int j = 0; j < NN[i].Count; ++j) {
lines[j] += NN[i][j] + ((i==NN.Count - 1) ? "" : ",");
}
}
File.WriteAllLines("path.csv", lines);
As first optimization pass, I wouldn't recommend using a list of lists, since accessing elements will be quite intensive. A bidimensional array would make a better job.
int[,] NN = new int[3,6] {{1, 2, 3, 4, 5, 6 }, {2, 5, 6, 3, 1, 0}, {0, 9, 2, 6, 7, 8}};
string[] lines = new string[NN.GetLength(1)];
for (int i = 0; i < NN.GetLength(0); ++i)
{
for (int j = 0; j < NN.GetLength(1); ++j)
{
lines[j] += NN[i,j] + ((i == NN.GetLength(0) - 1) ? "" : ",");
}
}
File.WriteAllLines("path.csv", lines);
Here is a performance test, for 500x500 elements (without counting the write to file):
To improve this solution I would first make the transpose in memory (without writing anything to a file or strings), and then perform a join(,) and write to a file (as a single byte array).
If you want to further optimize, think there is always room for it :)
For example, on x86, depending on the instruction set you have, you can read this article. On a CUDA enabled device, you can read that.
Anyway, a good solution will always involve aligned memory, sub-blocking, and close-to-metal written code (intrinsics or assembly).
Related
EDIT: Benchmarks for different techniques published at the bottom of this question.
I have a very large List<int> full of integers. I want to remove every occurrence of "3" from the List<int>. Which technique would be most efficient to do this? I would normally use the .Remove(3) extension until it returns false, but I fear that each call to .Remove(3) internally loops through the entire List<int> unnecessarily.
EDIT: It was recommended in the comments to try
TheList = TheList.Where(x => x != 3).ToList();
but I need to remove the elements without instantiating a new List.
var TheList = new List<int> { 5, 7, 8, 2, 8, 3, 1, 0, 6, 3, 9, 3, 5, 2, 7, 9, 3, 5, 5, 1, 0, 4, 5, 3, 5, 8, 2, 3 };
//technique 1
//this technique has the shortest amount of code,
//but I fear that every time the Remove() method is called,
//the entire list is internally looped over again starting at index 0
while (TheList.Remove(3)) { }
//technique 2
//this technique is an attempt to keep the keep the list from
//being looped over every time an element is removed
for (var i = 0; i < TheList.Count; i++)
{
if (TheList[i] == 3)
{
TheList.RemoveAt(i);
i--;
}
}
Are there any better ways to do this?
Benchmarks
I tested three techniques to remove 10,138 from an array with 100,000 elements: the two shown above, and one recommended by Serg in an answer. These are the results:
'while' loop: 179.6808ms
'for' loop: 65.5099ms
'RemoveAll' predicate: 0.5982ms
Benchmark Code:
var RNG = new Random();
//inclusive min and max random number
Func<int, int, int> RandomInt = delegate (int min, int max) { return RNG.Next(min - 1, max) + 1; };
var TheList = new List<int>();
var ThreeCount = 0;
for (var i = 0; i < 100000; i++)
{
var TheInteger = RandomInt(0, 9);
if (TheInteger == 3) { ThreeCount++; }
TheList.Add(TheInteger);
}
var Technique1List = TheList.ToList();
var Technique2List = TheList.ToList();
var Technique3List = TheList.ToList();
<div style="background-color:aquamarine;color:#000000;">Time to remove #ThreeCount items</div>
//technique 1
var Technique1Stopwatch = Stopwatch.StartNew();
while (Technique1List.Remove(3)) { }
var Technique1Time = Technique1Stopwatch.Elapsed.TotalMilliseconds;
<div style="background-color:#ffffff;color:#000000;">Technique 1: #(Technique1Time)ms ('while' loop)</div>
//technique 2
var Technique2Stopwatch = Stopwatch.StartNew();
for (var i = 0; i < Technique2List.Count; i++)
{
if (Technique2List[i] == 3)
{
Technique2List.RemoveAt(i);
i--;
}
}
var Technique2Time = Technique2Stopwatch.Elapsed.TotalMilliseconds;
<div style="background-color:#ffffff;color:#000000;">Technique 2: #(Technique2Time)ms ('for' loop)</div>
//technique 3
var Technique3Stopwatch = Stopwatch.StartNew();
var RemovedCount = Technique3List.RemoveAll(x => x == 3);
var Technique3Time = Technique3Stopwatch.Elapsed.TotalMilliseconds;
<div style="background-color:#ffffff;color:#000000;">Technique 3: #(Technique3Time)ms ('RemoveAll' predicate)</div>
You can just use List<T>.RemoveAll and pass your predicate - https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1.removeall?view=net-6.0#System_Collections_Generic_List_1_RemoveAll_System_Predicate__0__ . This guaranteed to be linear complexity O(list.Count)
TheList.RemoveAll(x=>x==3);
Additionally, RemoveAll performs some GC-specific things internally, so I think in some cases this may provide some additional performance advantages against the simple hand-made loop implementation (but I'm unsure here).
If you want to do it all yourself, you can check out the implementation of RemoveAll here. Generally, it is just a while loop as in your question.
Additionally, as we can see from GitHub implementation (and as Jon Skeet mentioned in the comment) the remove operation causes the rest of list (all items after the first removed items) to be copied (shifted) on the free space, intorduced by deletion. So, if you have really huge list and/or want to remove something frequently, you may consider to switching to some other data structure, such as linked list.
I want to make a program where users have to enter a list of 100 numbers.
The outcome has to be a 2 dimensional matrix with with 34 rows and 3 columns (where the last row has only 1 number of course).
Now I want to: first sort the array by ascending order. Then I want to sort each row separately by descending order.
I'll demonstrate with a two dimensional array containing 10 elements
If these are the numbers the user enters: 2, 4, 6, 9, 5, 2, 3, 4, 9, 7
I want the array to look like this:
3 2 2
5 4 4
9 7 6
9
I think it's easier to start with a 1D array which you can later copy to a 2D array if you really have to.
Array.Sort(input); //input is an int[100]
//Now, sort each row in descending order
var comparer = Comparer<int>.Create((a, b) => b.CompareTo(a));
for (int row = 0; row < 99; r+=3) {
Array.Sort(input, row, 3, comparer);
}
It's straightforward if you do the sorting and restructuring as separate steps:
Collect the results in a flat (i.e. one-dimensional) array.
Sort the flat array (e.g. with Array.Sort(...)).
Build up your new data structure by looping through the flat array. You don't need to do any further sorting here. Each time just take [arr[n+2], arr[n+1], arr[n]] as the row in the new 2D array and jump forward to n = n + 3.
I also favour creating the output structure by looping over the sorted input structure. The following will achieve what you want. It will take any size array of integers and the required number of columns for the two-dimensional output array and return the result as you defined it.
public static int?[,] SortInput(int[] input, int requiredColumnCount)
{
// Guard conditions.
if (input == null)
throw new ArgumentNullException(nameof(input));
if (input.Length < 1)
throw new ArgumentOutOfRangeException(nameof(input));
if (requiredColumnCount < 1)
throw new ArgumentOutOfRangeException(nameof(requiredColumnCount));
var inputLength = input.Length;
// Sort the input array in ascending order.
Array.Sort(input);
// Dimension the output array.
var requiredRowCount = (int)Math.Ceiling((decimal)inputLength / requiredColumnCount);
var output = new int?[requiredRowCount, requiredColumnCount];
// Setup variables to check for special handling of last output row.
var lastRowIndex = output.GetUpperBound(0);
var columnCountForLastRow = inputLength % requiredColumnCount;
// Populate the output array.
for (var inputIndex = 0; inputIndex < inputLength; inputIndex += requiredColumnCount)
{
var rowIndex = inputIndex / requiredColumnCount;
// Special handling may be required if there are insufficient
// input values to fully populate the last output row.
if ((rowIndex == lastRowIndex) && (columnCountForLastRow != 0))
requiredColumnCount = columnCountForLastRow;
for (var columnIndex = 0; columnIndex < requiredColumnCount; columnIndex++)
{
output[rowIndex, columnIndex] = input[inputIndex + requiredColumnCount - columnIndex - 1];
}
}
return output;
}
I have classic find duplicate algorithm like this:
int n = int.Parse(Console.ReadLine());
Console.WriteLine();
List<int> tempArr = new List<int>();
List<int> array = new List<int>();
for (int i = 0; i < n; i++)
{
Console.Write("input number {0}: ", i + 1);
tempArr.Add(int.Parse(Console.ReadLine()));
}
tempArr.Sort();
for (int i = 0; i < n; i++)
{
for (int j = i+1; j < n; j++)
{
if (tempArr[i] == tempArr[j])
{
array.Add(tempArr[i]);
}
}
}
Everything work's okay, but if i have just two duplicate numbers like (1,2,2,3,4,5) how can i add them both to List<int> **array** with one clean shot at the loop ?
Instead of lists you could use some kind of data structure that have a better search capability (hash tables or binary trees, for example). Even if you have just one duplicate, the problem is that you need to check if you have already added the element in the list, so the key operation in your algorithm is the search. The faster you perform the search, the faster the algorithm will be. Using binary search, which is the fastest way to search, you get O(nlogn) (you perform n searches of O(logn)).
An even better way to do this is to have some kind of array that has the same size as your input range and "tick" each value that you already have. This search runs in constant time, but gets inefficient if you have a large range of input.
You can use distinct:
array = tempArr.Distinct().ToList();
Distinct isn't in linear time, if that's what you're looking for ("one clean shot"). If you know more about the input you might be able to find a way to do this in linear time. For example, if you know if the integers you take are in a certain range.
To extract all the duplicates you can use Linq:
List<int> tempList = new List<int>() { 1, 2, 2, 3, 4, 5 };
// array == [2, 2]
List<int> array = tempList
.GroupBy(x => x)
.Where(x => x.Count() > 1)
.SelectMany(x => Enumerable.Repeat(x.Key, x.Count()))
.ToList();
There must be an better way to do this, I'm sure...
// Simplified code
var a = new List<int>() { 1, 2, 3, 4, 5, 6 };
var b = new List<int>() { 2, 3, 5, 7, 11 };
var z = new List<int>();
for (int i = 0; i < a.Count; i++)
if (b.Contains(a[i]))
z.Add(a[i]);
// (z) contains all of the numbers that are in BOTH (a) and (b), i.e. { 2, 3, 5 }
I don't mind using the above technique, but I want something fast and efficient (I need to compare very large Lists<> multiple times), and this appears to be neither! Any thoughts?
Edit: As it makes a difference - I'm using .NET 4.0, the initial arrays are already sorted and don't contain duplicates.
You could use IEnumerable.Intersect.
var z = a.Intersect(b);
which will probably be more efficient than your current solution.
note you left out one important piece of information - whether the lists happen to be ordered or not. If they are then a couple of nested loops that pass over each input array exactly once each may be faster - and a little more fun to write.
Edit
In response to your comment on ordering:
first stab at looping - it will need a little tweaking on your behalf but works for your initial data.
int j = 0;
foreach (var i in a)
{
int x = b[j];
while (x < i)
{
if (x == i)
{
z.Add(b[j]);
}
j++;
x = b[j];
}
}
this is where you need to add some unit tests ;)
Edit
final point - it may well be that Linq can use SortedList to perform this intersection very efficiently, if performance is a concern it is worth testing the various solutions. Dont forget to take the sorting into account if you load your data in an un-ordered manner.
One Final Edit because there has been some to and fro on this and people may be using the above without properly debugging it I am posting a later version here:
int j = 0;
int b1 = b[j];
foreach (var a1 in a)
{
while (b1 <= a1)
{
if (b1 == a1)
z1.Add(b[j]);
j++;
if (j >= b.Count)
break;
b1 = b[j];
}
}
There's IEnumerable.Intersect, but since this is an extension method, I doubt it will be very efficient.
If you want efficiency, take one list and turn it into a Set, then go over the second list and see which elements are in the set. Note that I preallocate z, just to make sure you don't suffer from any reallocations.
var set = new HashSet<int>(a);
var z = new List<int>(Math.Min(set.Count, b.Count));
foreach(int i in b)
{
if(set.Contains(i))
a.Add(i);
}
This is guaranteed to run in O(N+M) (N and M being the sizes of the two lists).
Now, you could use set.IntersectWith(b), and I believe it will be just as efficient, but I'm not 100% sure.
The Intersect() method does just that. From MSDN:
Produces the set intersection of two sequences by using the default
equality comparer to compare values.
So in your case:
var z = a.Intersect(b);
Use SortedSet<T> in System.Collections.Generic namespace:
SortedSet<int> a = new SortedSet<int>() { 1, 2, 3, 4, 5, 6 };
SortedSet<int> b = new SortedSet<int>() { 2, 3, 5, 7, 11 };
b.IntersectWith(s2);
But surely you have no duplicates!
Although your second list needs not to be a SortedSet. It can be any collection (IEnumerable<T>), but internally the method act in a way that if the second list also is SortedSet<T>, the operation is an O(n) operation.
If you can use LINQ, you could use the Enumerable.Intersect() extension method.
I have a list of int values some thing like below (upper bound and lower bounds are dynamic)
1, 2, 3
4, 6, 0
5, 7, 1
I want to calculate the column values in vertical wise like
1 + 4 + 5 = 10
2 + 6 + 7 = 15
3 + 0 + 1 = 4
Expected Result = 10,15,4
Any help would be appreciated
Thanks
Deepu
Here's the input data using array literals, but the subsequent code works exactly the same on arrays or lists.
var grid = new []
{
new [] {1, 2, 3},
new [] {4, 6, 0},
new [] {5, 7, 1},
};
Now produce a sequence with one item for each column (take the number of elements in the shortest row), in which the value of the item is the sum of the row[column] value:
var totals = Enumerable.Range(0, grid.Min(row => row.Count()))
.Select(column => grid.Sum(row => row[column]));
Print that:
foreach (var total in totals)
Console.WriteLine(total);
If you use a 2D array you can just sum the first, second,... column of each row.
If you use a 1D array you can simply use a modulo:
int[] results = new results[colCount];
for(int i=0, i<list.Count; list++)
{
results[i%colCount] += list[i];
}
Do you have to use a "List"-object? Elseway, I would use a twodimensional array.
Otherwise, you simply could try, how to reach rows and columns separatly, so you can add the numbers within a simply for-loop. It depends on the methods of the List-object.
Quite inflexible based on the question, but how about:
int ans = 0;
for(int i = 0; i < list.length; i+=3)
{
ans+= list[i];
}
You could either run the same thing 3 times with a different initial iterator value, or put the whole thing in another loop with startValue as an interator that runs 3 times.
Having said this, you may want to a) look at a different way of storing your data if, indeed they are in a single list b) look at more flexible ways to to this or wrap in to a function which allows you to take in to account different column numbers etc...
Cheers,
Adam