C# Excel OutOfMemory exception caused by inserting to a range - c#

I've been working on a quite complex C# VSTO project that does a lot of different things in Excel. However, I have recently stumbled upon a problem I have no idea how to solve. I'm afraid that putting the whole project here will overcomplicate my question and confuse everyone so this is the part with the problem:
//this is a simplified version of Range declaration which I am 100% confident in
Range range = worksheet.Range[firstCell, lastCell]
range.Formula = array;
//where array is a object[,] which basically contains only strings and also works perfeclty fine
The last line that is supposed to insert a [,] array to an Excel range used to work before for smaller Excel books, but now crashes for bigger books with a System.OutOfMemoryException: Insufficient memory to continue the execution of the program and I have no idea why, because it used to work with arrays as long as 500+ elements for one of its dimensions whereas now it crashes for an array with under 400 elements. Furthermore, the RAM usage is about 1.2GB at the moment of crash and I know this project is capable of running perfectly fine with the RAM usage of ~3GBs.
I have tried the following things: inserting this array row by row, then inserting it cell by cell, calling GC.Collect() before each insertion of a row or a cell but it would nonetheless crash with a System.OutOfMemoryException.
So I would appreciate any help in solving this problem or identifying where the error could possibly be hiding, because I can't wrap my head around why it refuses to work for arrays with smaller length (but maybe with slightly bigger contents) at the RAM usage level of 1.2GBs which is like 1/3 of what it used to handle. Thank you!
EDIT
I've been told in the comments that the code above might be too sparse, so here is a more detailed version (I hope it's not too confusing):
List<object[][]> controlsList = new List<object[][]>();
// this list is filled with a quite long method calling a lot of other functions
// if other parts look fine, I guess I'll have to investigate it
int totalRows = 1;
foreach (var control in controlsList)
{
if (control.Length == 0)
continue;
var range = worksheet.GetRange(totalRows + 1, 1, totalRows += control.Length, 11);
//control is an object[n][11] so normally there are no index issues with inserting
range.Formula = control.To2dArray();
}
//GetRange and To2dArray are extension methods
public static Range GetRange(this Worksheet sheet, int firstRow, int firstColumn, int lastRow, int lastColumn)
{
var firstCell = sheet.GetRange(firstRow, firstColumn);
var lastCell = sheet.GetRange(lastRow, lastColumn);
return (Range)sheet.Range[firstCell, lastCell];
}
public static Range GetRange(this Worksheet sheet, int row, int col) => (Range)sheet.CheckIsPositive(row, col).Cells[row, col];
public static T CheckIsPositive<T>(this T returnedValue, params int[] vals)
{
if (vals.Any(x => x <= 0))
throw new ArgumentException("Values must be positive");
return returnedValue;
}
public static T[,] To2dArray<T>(this T[][] source)
{
if (source == null)
throw new ArgumentNullException();
int l1 = source.Length;
int l2 = source[0].Length(1);
T[,] result = new T[l1, l2];
for (int i = 0; i < l1; ++i)
for (int j = 0; j < l2; ++j)
result[i, j] = source[i][j];
return result;
}

I am not 100% sure I figured it out correctly, but it seems like the issue lies within Interop.Excel/Excel limitations and the length of formulas I'm trying to insert: whenever the length approaches 8k characters, which is close to Excel limit for the formula contents, the System.OutOfMemoryException pops out. When I opted to leave lengthy formulas out, the program started working fine.

Related

Quick way of checking if a 2D array contains an element c#

I'm currently coding battleships as a part of a college project. The game works perfectly fine but I'd like to implement a way to check if a ship has been completely sunk. This is the method I'm currently using:
public static bool CheckShipSunk(string[,] board, string ship){
for(int i = 0; i < board.GetLength(0); i++){
for(int j = 0; j < board.GetLength(1); j++){
if(board[i,j] == ship){return false;}
}
}
return true;
}
The problem with this is that there are 5 ships, and this is very inefficient when checking hundreds of elements 5 times over, not to mention the sub-par quality of college computers. Is there an easier way of checking if a 2D array contains an element?
Use an arithmetic approach to loop-through with just 1 loop.
public static bool CheckShipSunk(string[,] board, string ship){
int rows = board.GetLength(0);
int cols = board.GetLength(1);
for (int i = 0; i < rows * cols; i++) {
int row = i / cols;
int col = i % cols;
if (board[row, col] == ship)
return false;
}
return true;
}
But I am with Nysand on just caching and storing that information in cells. The above code although might work, is not recommended as it is still not as efficient
this is very inefficient when checking hundreds of elements 5 times over
Have you done any profiling? Computers are fast even your old college computers. Checking hundreds of elements should take microseconds. From Donald Knuths famous quote
There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
So if you feel your program is slow I would recommend to start with profiling. If you are in university this might be a very valuable skill to learn.
There are also better algorithms/datastructures that could be employed. I would for example expect each ship to know what locations they are at, and various other information, like if it is sunk at all. Selecting appropriate data structures are also a very important skill to learn, but a difficult one. Also, try to not get stuck in analysis-paralysis, a terrible inefficient ugly working solution is still better than the most beautiful code that does not work.
However, a very easy thing to fix is moving .GetLength out of the loop. This is a very slow call, and only doing this once should make your loop a several times faster for almost no effort. You might also consider replacing the strings with some other identifier, like an int.
Willy-Nilly you have to scan either the entire array or up to the first ship.
You can simplify the code by quering the array with help of Linq, but not increase performance which has O(length * width) time complexity.
using System.Linq;
...
// No explicit loops, juts a query (.net will loop for you)
public static bool CheckShipSunk(string[,] board, string ship) => board is null
? throw new ArgumentNullException(nameof(board))
: board.AsEnumerable().Any(item => item == ship);
If you are looking for performance (say, you have a really huge array, many ships to test etc.), I suggest changing the data structure: Dictionary<string, (int row, int col)>
instead of the string[,] array:
Dictionary<string, (int row, int col)> sunkShips =
new Dictionary<string, (int row, int col)>(StringComparer.OrdinalIgnoreCase) {
{ "Yamato", (15, 46) },
{ "Bismark", (11, 98) },
{ "Prince Of Wales", (23, 55) },
};
and then get it as easy as
public static bool CheckShipSunk(IDictionary<string, (int row, int col)> sunkShips,
string ship) =>
sunkShips?.Keys?.Contains(ship) ?? false;
Note that time complexity is O(1) which means it doesn't depend on board length and width

C# IronXL (Excel) + LINQ memory issue

My goal is to find all the cells in an Excel containing a specific text. The Excel is quite large (about 2Mb) and has about 22 sheets. Historically we had problems with Interop, so I found IronXL which I love the way it operates.
The problem is that at some point, the RAM memory increases above 2Gb, and of course it's very slow.
I'm aware of the materialization issue, so I'm trying to avoid ToList() or Count() when using LINQ.
The first "problem" I found with IronXL is that the Cell class doesn't have any field specifying the sheet name where it is contained, so I divided the code in 2 sections:
The LINQ to find all the cells containing the text
Then I iterate in all previous results to store the desired cell info + sheet name where it was found in my custom class MyCell
The custom class:
class MyCell
{
public int X;
public int Y;
public string Location;
public string SheetName;
public MyCell(int x, int y, string location, string sheetName)
{
X = x;
Y = y;
Location = location;
SheetName = sheetName;
}
}
Here is my code:
List<MyCell> FindInExcel(WorkBook wb, string textToFind)
{
List<MyCell> res = new List<MyCell>();
var cells = from sheet in wb.WorkSheets
from cell in sheet
where cell.IsText && cell.Text.Contains(textToFind)
select new { cell, sheet };
foreach (var cell in cells)
{
res.Add(new MyCell(cell.cell.ColumnIndex, cell.cell.RowIndex, cell.cell.Location, cell.sheet.Name));
}
return res;
}
To test my method, I call:
WorkBook excel = WorkBook.Load("myFile.xlsx");
var results = FindInExcel(excel, "myText");
What happens when I execute and debug the code is indeed very weird. The LINQ query is executed very fast, and in my case I get 2 results. Then it starts iterating in the foreach, and the first 2 times the values are added to the list, so, everything is perfect. But the 3rd time, when it evaluates if any other item is available, is when the memory reaches 2Gb and takes like 10 seconds.
I observed the same behaviour when I do this:
int count = cells.Count()
I'm aware this is materializing the results, but what I don't understand is why I get the 2 first results in the foreach so fast, and it's only in the last step where the memory increases.
Seeing this behavior, it seems clear the code knows somewhere how many items has found without having to call the Count(), otherwise it would be slow the first time the "foreach" is called.
Just to know if I was getting crazy, I tried to put this small code in the FindInExcel method:
int cnt = 0;
foreach (var cell in cells)
{
res.Add(new MyCell(cell.cell.ColumnIndex, cell.cell.RowIndex, cell.cell.Location, cell.sheet.Name));
cnt++;
if (cnt == 2)
break;
}
In this last case, I don't have the memory issue and I finally get a List of 2 items with the cells I want, and without any memory issue.
What am I missing? Is there any way to do what I'm trying to do without materializing the results? I even tried to move to the .NET Framework 4.8.1 to see if some bug was fixed, but I'm getting the same behavior.
Note: If I use this code in a small Excel, it runs very fast.
Thank you in advance!
I already found the issue. There was a sheet where a hidden formula was extended to the last cell (M1048576), and so it was searching the value in all of these cells. Once removed, there is no memory issue anymore.
Thank you guys!

Why isn't this method returning anything?

This is for my internship so I can't give much more context, but this method isn't returning the desired int and causing an Index Out of Bounds exception instead.
The String[] taken into the method is composed of information from a handheld scanner used by my company's shipping department. Its resulting dataAsByteArray is really a Byte[][] so the .Length in the nested If statement will get the number of Bytes of a Bundle entry and then add it to fullBundlePacketSize as long as the resulting sum is less than 1000.
Why less than 1000? The bug I've been tasked with fixing is that some scanners (with older versions of Bluetooth) will only transmit about 1000 bytes of data to the receiver at a time. This method is to find how many bytes can be transmitted without cutting into a bundle entry (originally I had it hard coded to just transmit 1000 bytes at a time and that caused the receiver to get invalid bundle data).
The scanners are running a really old version of Windows CE and trying to debug in VS (2008) just opens an emulator for the device which doesn't help.
I'm sure it's something really simple, but I feel like a new set of eyes looking at it would help, so any help or solutions are greatly appreciated!
private int MaxPacketSizeForFullBundle(string[] data)
{
int fullBundlePacketSize = 0;
var dataAsByteArray = data.Select(s => Encoding.ASCII.GetBytes(s)).ToArray();
for (int i = 0; i < dataAsByteArray.Length; i++)
{
if ((fullBundlePacketSize + dataAsByteArray[i + 1].Length < 1000))
{
fullBundlePacketSize += dataAsByteArray[i].Length;
}
}
return fullBundlePacketSize;
}
Take a look at your loop:
for (int i = 0; i < dataAsByteArray.Length; i++)
{
if ((fullBundlePacketSize + dataAsByteArray[i + 1].Length < 1000))
^^^^^
I suspect you are throwing an exception because you are indexing an array beyond its length.
Do you mean this?
for (int i = 0; i < (dataAsByteArray.Length - 1); i++)
In conjunction is #n8wrl's answer. Your problem was accessing out of range exception. I believe you were attempting to access the actual byte values inside of the values in dataAsByteArray.
string[] data = {"444", "abc445x"};
int x = MaxPacketSizeForFullBundle(data);
dataAsByteArray = {byte[2][]}
byte[2][0] contains {byte[3]} which contains the actual values of each character in the string, so for "444", it would contain 52, 52, 52. You are probably attempting to access these individual values which means that you need to access the deeper nested bytes with byte[i][j] where 0<j<data[i].Length.

How to cycle through an (n by 12) 2D array

I have a 2-D array (with dimensions magnitudes n by 5), which I'm picturing in my head like this (each box is an element of the array):
(http://tr1.cbsistatic.com/hub/i/2015/05/07/b1ff8c33-f492-11e4-940f-14feb5cc3d2a/12039.jpg)
In this image, n is 3. I.e, n is the number of columns, 5 is the number of rows in my array.
I want to find an efficient way to iterate (i.e walk) through every path that leads from any cell in the left most column, to any cell in right most column, choosing one cell from every column in between.
It cannot be simply solved by n nested loops, because n is only determined at run time.
I think this means recursion is likely the best way forward, but can't picture how to begin theoretically.
Can you offer some advice as to how to cycle through every path. It seems simple enough and I can't tell what I'm doing wrong. Even just a theoretical explanation without any code will be very much appreciated.
I'm coding in C#, Visual Studio in case that helps.
UPDATE:: resolved using code below from http://www.introprogramming.info/english-intro-csharp-book/read-online/chapter-10-recursion/#_Toc362296468
static void NestedLoops(int currentLoop)
{
if (currentLoop == numberOfLoops)
{
return;
}
for (int counter=1; counter<=numberOfIterations; counter++)
{
loops[currentLoop] = counter;
NestedLoops(currentLoop + 1);
}
}
This is a factorial problem and so you might run quite quickly into memory or value limits issues.
Took some code from this SO post by Diego.
class Program
{
static void Main(string[] args)
{
int n = 5;
int r = 5;
var combinations = Math.Pow(r, n);
var list = new List<string>();
for (Int64 i = 1; i < combinations; i++)
{
var s = LongToBase(i);
var fill = n - s.Length;
list.Add(new String('0', fill) + s);
}
// list contains all your paths now
Console.ReadKey();
}
private static readonly char[] BaseChars = "01234".ToCharArray();
public static string LongToBase(long value)
{
long targetBase = BaseChars.Length;
char[] buffer = new char[Math.Max((int)Math.Ceiling(Math.Log(value + 1, targetBase)), 1)];
var i = (long)buffer.Length;
do
{
buffer[--i] = BaseChars[value % targetBase];
value = value / targetBase;
}
while (value > 0);
return new string(buffer);
}
}
list will contain a list of numbers expressed in base 5 which can be used to found out the path. for example "00123" means first cell, then first cell then second cell, then third cell and finall fourth cell.
Resolved:: see the code posted in the edited question above, and the link to a recursion tutorial, where it takes you through using recursion to simulate N nested, iterative loops.

Generating a dataset with few unique values

Note: This is part 2 of a 2 part question.
Part 1 here
I'm wanting to more about sorting algorithms and what better way to do than then to code! So I figure I need some data to work with.
My approach to creating some "standard" data will be as follows: create a set number of items, not sure how large to make it but I want to have fun and make my computer groan a little bit :D
Once I have that list, I'll push it into a text file and just read off that to run my algorithms against. I should have a total of 4 text files filled with the same data but just sorted differently to run my algorithms against (see below).
Correct me if I'm wrong but I believe I need 4 different types of scenarios to profile my algorithms.
Randomly sorted data (for this I'm going to use the knuth shuffle)
Reversed data (easy enough)
Nearly sorted (not sure how to implement this)
Few unique (once again not sure how to approach this)
This question is for generating a list with a few unique items of data.
Which approach is best to generate a dataset with a few unique items.
Answering my own question here. Don't know if this is the best but it works.
public static int[] FewUnique(int uniqueCount, int returnSize)
{
Random r = _random;
int[] values = new int[uniqueCount];
for (int i = 0; i < uniqueCount; i++)
{
values[i] = i;
}
int[] array = new int[returnSize];
for (int i = 0; i < returnSize; i++)
{
array[i] = values[r.Next(0, values.Count())];
}
return array;
}
It might be worth having a look at NBuilder. It's a framework designed to generate objects for testing with and sounds like just what you need.
You could deal with the "few unique" items with some code like this:
var products = Builder<YourClass>.CreateListOfSize(1000)
.WhereAll().AreConstructedWith("some fixed value")
.WhereRandom(20).AreConstructedWith("some other fixed value")
.Build();
There's plenty of other variations you can use as well to get the data like you want it. Have a look at some of the samples on the site for more ideas.
http://pages.cs.wisc.edu/~bart/fuzz/
Is all about fuzz testing which focuses on semi random data. It should be straight forward to adapt this approach to your problem
I guess your solution is ok. I would only modify it slighly:
public static int[] FewUnique(int uniqueCount, int low, int high, int returnSize)
{
Random r = _random;
int[] values = new int[uniqueCount];
for (int i = 0; i < uniqueCount; i++)
{
values[i] = r.Next(low, high);
}
int[] array = new int[returnSize];
for (int i = 0; i < returnSize; i++)
{
array[i] = values[r.Next(0, values.Count())];
}
return array;
}
For some algorithms this might make a difference.

Categories