I'm importing an excel file into a datatable (dtImport) and rearranging that data into another datatable (dtImportParsed).
Here's what that datatable (dtImport) looks like when I first import it.
And this is how I'm trying to rearrange that datatable (dtImportParsed):
I'm currently accomplishing this by using some nested for loops, but this takes a very long time. For example, a sheet with 36 columns and 4,000 rows takes about 30-40 minutes to complete. Is there an alternative method of accomplishing this that would speed things up?
Here's my code:
for (int c = 2; c < dtImport.Columns.Count; c++) //for each date column
{
for (int r = 1; r < dtImport.Rows.Count; r++)
{
if (dtImportParsed.Rows.Count == 0)
{
DataRow dataRowImport = dtImportParsed.NewRow();
dataRowImport["Date"] = dtImport.Columns[c].ColumnName.ToString().Trim();
dataRowImport["account_id"] = dtImport.Rows[r]["account_id"].ToString().Trim();
dataRowImport[dtImport.Rows[r]["Event Name"].ToString().Trim()] = dtImport.Rows[r][c].ToString().Trim();
dtImportParsed.Rows.Add(dataRowImport);
}
else
{
for (int i = 0; i < dtImportParsed.Rows.Count; i++)
{
if (dtImportParsed.Rows[i]["account_id"].ToString() == dtImport.Rows[r]["account_id"].ToString())
{
if (dtImportParsed.Rows[i]["Date"].ToString() == dtImport.Columns[c].ColumnName.ToString())
{
dtImportParsed.Rows[i][dtImport.Rows[r]["Event Name"].ToString().Trim()] = dtImport.Rows[r][c].ToString().Trim();
break;
}
}
else if (i == dtImportParsed.Rows.Count - 1)
{
DataRow dataRowImport = dtImportParsed.NewRow();
dataRowImport["Date"] = dtImport.Columns[c].ColumnName.ToString().Trim();
dataRowImport["account_id"] = dtImport.Rows[r]["account_id"].ToString().Trim();
dataRowImport[dtImport.Rows[r]["Event Name"].ToString().Trim()] = dtImport.Rows[r][c].ToString().Trim();
dtImportParsed.Rows.Add(dataRowImport);
}
}
}
}
}
The algorithm you use to generate your expected result is too expensive! It will execute in order (c x r x i) where i > r because empty fields are injected in the final table; Actually it is an O(n3) algorithm! Also you preform it on DataTables via iterating DataRows that probably are not efficient for your requirement.
If your source data-set is not large (as you mentioned) and you have not memory restriction, I propose you to arrange expected data-set in memory using index-based data structures. Something like this:
var arrangeDragon = new Dictionary<string, Dictionary<string, Dictionary<string, string>>>();
The dragon enters! And eats the inner for.
for (int c = 2; c < dtImport.Columns.Count; c++) //for each date column
{
for (int r = 1; r < dtImport.Rows.Count; r++)
{
// ...
// instead of: for (int i = 0; i < dtImportParsed.Rows.Count; i++) ...
string date = dtImport.Columns[c].ColumnName.ToString().Trim();
string accountId = dtImport.Rows[r]["account_id"].ToString();
string eventName = dtImport.Rows [r]["Event Name"].ToString().Trim();
if (!arrangeDragon.ContainsKey(date))
arrangeDragon.Add(date, new Dictionary<string, Dictionary<string, string>>());
if (!arrangeDragon[date].ContainsKey(accountId))
arrangeDragon[date][accountId] = new Dictionary<string, string>();
if (!arrangeDragon[date][accountId].ContainsKey(eventName))
arrangeDragon[date][accountId][eventName] = dtImport.Rows[r][c].ToString().Trim();
// ...
}
}
These checks will execute in O(1) instead of O(i), so total overhead will decrease to O(n2) that is the nature of iterating table :)
Also retrieve order is O(1):
string data_field = arrangeDragon["1/1/2022"]["account1"]["Event1"];
Assert.AreEqual(data_field, "42");
Now you can iterate nested Dictionarys once and build the dtImportParsed.
If your data-set is large or host memory is low, you need other solutions that is not your problem as mentioned ;)
Good luck
Related
I had an interviewer ask me to write a program in c# to figure out the max number of 4 members families that can sit consecutively in a venue, taking into account that the 4 members must be consecutively seated in one single row, with the following context:
N represents the number of rows availabe.
The Columns are labeled from the letter "A" to "K", purposely ommiting the letter "i" (in other words, {A,B,C,D,E,F,G,H,J,K})
M represents a list of reserved seats
Quick example:
N = 2
M = {"1A","2F","1C"}
Solution = 3
In the representation you can see that, with the reservations and the size given, only three families of 4 can be seated in a consecutive order.
How would you solve this? is it possible to not use for loops? (Linq solutions)
I got mixed up in the for loops when trying to deal with the reservations aray: My idea was to obtain all the reservations that a row has, but then I don't really know how to deal with the letters (Converting directly from letter to number is a no go because the missing "I") and you kinda need the letters to position the reserved sits anyway.
Any approach or insight on how to go about this problem would be nice.
Thanks in advance!
Here is another implementation.
I also tried to explain why certain things have been done.
Good luck.
private static int GetNumberOfAvailablePlacesForAFamilyOfFour(int numberOfRows, string[] reservedSeats)
{
// By just declaring the column names as a string of the characters
// we can query the column index by colulmnNames.IndexOf(char)
string columnNames = "ABCDEFGHJK";
// Here we transform the reserved seats to a matrix
// 1A 2F 1C becomes
// reservedSeatMatrix[0] = [0, 2] -> meaning row 1 and columns A and C, indexes 0 and 2
// reservedSeatMatrix[1] = [5] -> meaning row 2 and column F, index 5
List<List<int>> reservedSeatMatrix = new List<List<int>>();
for (int row = 0; row < numberOfRows; row++)
{
reservedSeatMatrix.Add(new List<int>());
}
foreach (string reservedSeat in reservedSeats)
{
int seatRow = Convert.ToInt32(reservedSeat.Substring(0, reservedSeat.Length - 1));
int seatColumn = columnNames.IndexOf(reservedSeat[reservedSeat.Length - 1]);
reservedSeatMatrix[seatRow - 1].Add(seatColumn);
}
// Then comes the evaluation.
// Which is simple enough to read.
int numberOfAvailablePlacesForAFamilyOfFour = 0;
for (int row = 0; row < numberOfRows; row++)
{
// Reset the number of consecutive seats at the beginning of a new row
int numberOfConsecutiveEmptySeats = 0;
for (int column = 0; column < columnNames.Length; column++)
{
if (reservedSeatMatrix[row].Contains(column))
{
// reset when a reserved seat is reached
numberOfConsecutiveEmptySeats = 0;
continue;
}
numberOfConsecutiveEmptySeats++;
if(numberOfConsecutiveEmptySeats == 4)
{
numberOfAvailablePlacesForAFamilyOfFour++;
numberOfConsecutiveEmptySeats = 0;
}
}
}
return numberOfAvailablePlacesForAFamilyOfFour;
}
static void Main(string[] args)
{
int familyPlans = GetNumberOfAvailablePlacesForAFamilyOfFour(2, new string[] { "1A", "2F", "1C" });
}
Good luck on your interview
As always, you will be asked how could you improve that? So you'd consider complexity stuff like O(N), O(wtf).
Underlying implementation would always need for or foreach. Just importantly, never do unnecessary in a loop. For example, if there's only 3 seats left in a row, you don't need to keep hunting on that row because it is not possible to find any.
This might help a bit:
var n = 2;
var m = new string[] { "1A", "2F", "1C" };
// We use 2 dimension bool array here. If it is memory constraint, we can use BitArray.
var seats = new bool[n, 10];
// If you just need the count, you don't need a list. This is for returning more information.
var results = new List<object>();
// Set reservations.
foreach (var r in m)
{
var row = r[0] - '1';
// If it's after 'H', then calculate index based on 'J'.
// 8 is index of J.
var col = r[1] > 'H' ? (8 + r[1] - 'J') : r[1] - 'A';
seats[row, col] = true;
}
// Now you should all reserved seats marked as true.
// This is O(N*M) where N is number of rows, M is number of columns.
for (int row = 0; row < n; row++)
{
int start = -1;
int length = 0;
for (int col = 0; col < 10; col++)
{
if (start < 0)
{
if (!seats[row, col])
{
// If there's no consecutive seats has started, and current seat is available, let's start!
start = col;
length = 1;
}
}
else
{
// If have started, check if we could have 4 seats.
if (!seats[row, col])
{
length++;
if (length == 4)
{
results.Add(new { row, start });
start = -1;
length = 0;
}
}
else
{
// // We won't be able to reach 4 seats, so reset
start = -1;
length = 0;
}
}
if (start < 0 && col > 6)
{
// We are on column H now (only have 3 seats left), and we do not have a consecutive sequence started yet,
// we won't be able to make it, so break and continue next row.
break;
}
}
}
var solution = results.Count;
LINQ, for and foreach are similar things. It is possible you could wrap the above into a custom iterator like:
class ConsecutiveEnumerator : IEnumerable
{
public IEnumerator GetEnumerator()
{
}
}
Then you could start using LINQ.
If you represent your matrix in simple for developers format, it will be easier. You can accomplish it either by dictionary or perform not so complex mapping by hand. In any case this will calculate count of free consecutive seats:
public static void Main(string[] args)
{
var count = 0;//total count
var N = 2; //rows
var M = 10; //columns
var familySize = 4;
var matrix = new []{Tuple.Create(0,0),Tuple.Create(1,5), Tuple.Create(0,2)}.OrderBy(x=> x.Item1).ThenBy(x=> x.Item2).GroupBy(x=> x.Item1, x=> x.Item2);
foreach(var row in matrix)
{
var prevColumn = -1;
var currColumn = 0;
var free = 0;
var div = 0;
//Instead of enumerating entire matrix, we just calculate intervals in between reserved seats.
//Then we divide them by family size to know how many families can be contained within
foreach(var column in row)
{
currColumn = column;
free = (currColumn - prevColumn - 1)/familySize;
count += free;
prevColumn = currColumn;
}
currColumn = M;
free = (currColumn - prevColumn - 1)/familySize;
count += free;
}
Console.WriteLine("Result: {0}", count);
}
I am planning to get an array of the averages of each column.
But my app crashes at sum[j] += int.Parse(csvArray[i,j]); due to a FormatException. I have tried using Convert.ToDouble and Double.Parse but it still throws that exception.
The increments in the for loop start at 1 because Row 0 and Column 0 of the CSV array are strings (names and timestamps). For the divisor or total count of the fields that have values per column, I only count the fields that are not BLANK, hence the IF statement. I think I need help at handling the exception.
Below is the my existing for the method of getting the averages.
public void getColumnAverages(string filePath)
{
int col = colCount(filePath);
int row = rowCount(filePath);
string[,] csvArray = csvToArray(filePath);
int[] count = new int[col];
int[] sum = new int[col];
double[] average = new double[col];
for (int i = 1; i < row; i++)
{
for (int j = 1; j < col; j++)
{
if (csvArray[i,j] != " ")
{
sum[j] += int.Parse(csvArray[i,j]);
count[j]++;
}
}
}
for (int i = 0; i < average.Length; i++)
{
average[i] = (sum[i] + 0.0) / count[i];
}
foreach(double d in average)
{
System.Diagnostics.Debug.Write(d);
}
}
}
I have uploaded the CSV file that I use when I test the prototype. It has BLANK values on some columns. Was my existing IF statement unable to handle that case?
There are also entries like this 1.324556-e09due to the number of decimals I think. I guess I have to trim it in the csvToArray(filePath) method or are there other efficient ways? Thanks a million!
So there are a few problems with your code. The main reason for your format exception is that after looking at your CSV file your numbers are surrounded by quotes. Now I can't see from your code exactly how you convert your CSV file to an array but I'm guessing that you don't clear these out - I didn't when I first ran with your CSV and experienced the exact same error.
I then ran into an error because some of the values in your CSV are decimal, so the datatype int can't be used. I'm assuming that you still want the averages of these columns so in my slightly revised verion of your method I change the arrays used to be of type double.
AS #musefan suggested, I have also changed the check for empty places to use the IsNullOrWhiteSpace method.
Finally when you output your results you receive a NaN for the first value in the averages column, this is because when you don't take into account that you never populate the first position of your arrays so as not to process the string values. I'm unsure how you'd best like to correct this behaviour as I'm not sure of the intended purpose - this might be okay - so I've not made any changes to this for the moment, pop a mention in the comments if you want help on how to sort this!
So here is the updated method:
public static void getColumnAverages(string filePath)
{
// Differs from the current implementation, reads a file in as text and
// splits by a defined delim into an array
var filePaths = #"C:\test.csv";
var csvArray = File.ReadLines(filePaths).Select(x => x.Split(',')).ToArray();
// Differs from the current implementation
var col = csvArray[0].Length;
var row = csvArray.Length;
// Update variables to use doubles
double[] count = new double[col];
double[] sum = new double[col];
double[] average = new double[col];
Console.WriteLine("Started");
for (int i = 1; i < row; i++)
{
for (int j = 1; j < col; j++)
{
// Remove the quotes from your array
var current = csvArray[i][j].Replace("\"", "");
// Added the Method IsNullOrWhiteSpace
if (!string.IsNullOrWhiteSpace(current))
{
// Parse as double not int to account for dec. values
sum[j] += double.Parse(current);
count[j]++;
}
}
}
for (int i = 0; i < average.Length; i++)
{
average[i] = (sum[i] + 0.0) / count[i];
}
foreach (double d in average)
{
System.Diagnostics.Debug.Write(d + "\n");
}
}
Below is a crude for-loop to illustrate what I need to do.
Basically, if there are any 'Variable' objects with property 'Name' containing the text "TCC#", then I want to change the 'Type' property (not the .Net type) to 'VariableType.Text'.
The code is going to run over 4800 ParsedCard variables and currently takes a stupid amount of time (about 10 minutes) to simply iterate through the list and write a line to the Debug console.
ParsedCard has
IEnumerable functions which have
IEnumerable groups which have
ParseResults which have
IEnumerable variables
This is such a simple problem but I've tried all sorts of variations using LINQ but can't find anything that performs well (less than 10 seconds).
private void AdjustTCCVariables(IList<ParsedCard> parsedCards)
{
for (var i = 0; i < parsedCards.Count; i++)
{
var parsedCard = parsedCards[i];
for (var j = 0; j < parsedCard.Functions.Count(); j++)
{
var function = parsedCard.Functions.ToList()[j];
for (var k = 0; k < function.Groups.Count(); k++)
{
var group = function.Groups.ToList()[k];
for (var l = 0; l < group.ParseResult.Variables.Count(); l++)
{
var variable = group.ParseResult.Variables.ToList()[l];
if (variable.Name.Contains("TCC#"))
{
//variable.Type = VariableType.Text;
Debug.WriteLine($"Need to change variable at [{i}][{j}][{k}][{l}]");
}
}
}
}
}
}
I've tried with this LINQ but it doesn't actually change the 'variable.Type' of the input list (I suspect because it creates a new copy of the objects in memory and the assignment isn't actually affected the 'parsedCards' IEnumerable at all:
private void AdjustTCCVariables(IEnumerable<ParsedCard> parsedCards)
{
var targetVariables =
parsedCards.SelectMany(x => x.Functions.SelectMany(z => z.Groups))
.SelectMany(x => x.ParseResult.Variables.Where(v => v.Name.Contains("TCC#")));
;
foreach (var variable in targetVariables)
{
variable.Type = VariableType.Text;
}
}
As mentioned, the bottleneck in your iterations is the .ToList() calls.
Since you mention that you only want to edit the variable.Type property, I would solve this like this.
var variables = from parsedCard in parsedCards
from function in parsedCard.Functions
from group in function.Groups
from variable in group.ParseResult.Variables
where variable.Name.Contains("TCC#")
select variable;
foreach (var variable in variables) {
variable.Type = VariableType.Text;
}
You don't need to know anything other than the variable objects that need changing, you don't need all the indexes and all the other variables. Just select what you need to know, and change it.
This way you will not know the indexes, so your Debug.WriteLine(...); line won't work.
Without knowing what the defintion of the classes , here is some tips.
Remove toList, dont count on the iteration (for statement)
int numberOf = parsedCards.Count
for (var i = 0; i < numberOf; i++)
{
//var parsedCard = parsedCards[i];
int noOf2 = parsedCard[i].Functions.Count()
for (var j = 0; j < noOf2; j++)
{
var function = parsedCard[i].Functions[j];
int = function.Groups.Count();
for (var k = 0; k < noOfGroups; k++)
{
var group = function.Groups[k];
int noOfVars = group.ParseResult.Variables.Count();
for (var l = 0; l < noOfVars; l++)
{
var variable = group.ParseResult.Variables[l];
if (variable.Name.Contains("TCC#"))
{
//variable.Type = VariableType.Text;
Debug.WriteLine($"Need to change variable at [{i}][{j}][{k}][{l}]");
}
}
}
}
}
I have a C# console window program and I am trying to sort "File3" (contains numbers) in ascending and output lines from 3 text files.
So the outcome looks something like this:
===========================================================================
field1.....................field2.....................field3
===========================================================================
[FILE1_LINE1]..............[FILE2_LINE1]..............[FILE3_LINE1]
[FILE1_LINE2]..............[FILE2_LINE2]..............[FILE3_LINE2]
[FILE1_LINE3]..............[FILE2_LINE3]..............[FILE3_LINE3]
and so on...
At the moment, it kinda works I think but it duplicates the first two lines it seems. Could someone give an example of better coding please?
Here is the code that I have atm:
string[] File1 = System.IO.File.ReadAllLines(#"FILE1.txt");
string[] File2 = System.IO.File.ReadAllLines(#"FILE2.txt");
string[] File3 = System.IO.File.ReadAllLines(#"FILE3.txt");
decimal[] File3_1 = new decimal[File3.Length];
for(int i=0; i<File3.Length; i++)
{
File3_1[i] = decimal.Parse(File3[i]);
}
decimal[] File3_2 = new decimal[File3.Length];
for(int i=0; i<File3.Length; i++)
{
File3_2[i] = decimal.Parse(File3[i]);
}
decimal number = 0;
for (double i = 0.00; i < File3_1.Length; i++)
{
for (int sort = 0; sort < File3_1.Length - 1; sort++)
{
if (File3_1[sort] > File3_1[sort + 1])
{
number = File3_1[sort + 1];
File3_1[sort + 1] = File3_1[sort];
File3_1[sort] = number;
}
}
}
if (SortChoice2 == 1)
{
for (int y = 0; y < File3_2.Length; y++)
{
for (int s = 0; s < File3_2.Length; s++)
{
if (File3_1[y] == File3_2[s])
{
Console.WriteLine(File1[s] + File2[s] + File3_1[y]);
}
}
}
}
Just for more info, most of this code was used for another program and worked but in my new program, this doesn't as I've said above - ("it repeats a couple of lines for some reason"). I'm kinda an amateur/ rookie at C# so I only get stuff like this to work with examples.
Thanks in advance :)
Ok, if I understand correctly, what you are trying to do is read the lines from 3 different files, each of them representing a different "field" in a table. You then want to sort this table based on the value of one of the field (in you code, this seems to be the field which values are contained in File3. Well, if I got that right, here's what I suggest you do:
// Read data from files
List<string> inputFileNames = new List<string> {"File1.txt", "File2.txt", "File3.txt"};
decimal[][] fieldValues = new decimal[inputFileNames.Count][];
for (int i = 0; i < inputFileNames.Count; i++)
{
string currentInputfileName = inputFileNames[i];
string[] currentInputFileLines = File.ReadAllLines(currentInputfileName);
fieldValues[i] = new decimal[currentInputFileLines.Length];
for (int j = 0; j < currentInputFileLines.Length; j++)
{
fieldValues[i][j] = decimal.Parse(currentInputFileLines[j]);
}
}
// Create table
DataTable table = new DataTable();
DataColumn field1Column = table.Columns.Add("field1", typeof (decimal));
DataColumn field2Column = table.Columns.Add("field2", typeof (decimal));
DataColumn field3Column = table.Columns.Add("field3", typeof (decimal));
for (int i = 0; i < fieldValues[0].Length; i++)
{
var newTableRow = table.NewRow();
newTableRow[field1Column.ColumnName] = fieldValues[0][i];
newTableRow[field2Column.ColumnName] = fieldValues[1][i];
newTableRow[field3Column.ColumnName] = fieldValues[2][i];
table.Rows.Add(newTableRow);
}
// Sorting
table.DefaultView.Sort = field1Column.ColumnName;
// Output
foreach (DataRow row in table.DefaultView.ToTable().Rows)
{
foreach (var item in row.ItemArray)
{
Console.Write(item + " ");
}
Console.WriteLine();
}
Now, I tried to keep the code above as LINQ free as I could, since you do not seem to be using it in your example, and therefore might not know about it. That being said, while there is a thousand way to do I/O in C#, LINQ would help you a lot in this instance (and in pretty much any other situation really), so I suggest you look it up if you don't know about it already.
Also, the DataTable option I proposed is just to provide a way for you to visualize and organize the data in a more efficient way. That being said, you are in no way obliged to use a DataTable: you could stay with a more direct approach and use more common data structures (such as lists, arrays or even dictionaries if you know what they are) to store the data, depending on your needs. It's just that with a DataTable, you don't, for example, need to do the sorting yourself, or deal with columns indexed only by integers. With time, you'll come to learn about the myriad of useful data structure and native functionalities the C# language offers you and how they can save you doing the work yourself in a lot of cases.
I have an array in c# that is 1-based (generated from a call to get_Value for an Excel Range
I get a 2D array for example
object[,] ExcelData = (object[,]) MySheet.UsedRange.get_Value(Excel.XlRangeValueDataType.xlRangeValueDefault);
this appears as an array for example ExcelData[1..20,1..5]
is there any way to tell the compiler to rebase this so that I do not need to add 1 to loop counters the whole time?
List<string> RowHeadings = new List<string>();
string [,] Results = new string[MaxRows, 1]
for (int Row = 0; Row < MaxRows; Row++) {
if (ExcelData[Row+1, 1] != null)
RowHeadings.Add(ExcelData[Row+1, 1]);
...
...
Results[Row, 0] = ExcelData[Row+1, 1];
& other stuff in here that requires a 0-based Row
}
It makes things less readable since when creating an array for writing the array will be zero based.
Why not just change your index?
List<string> RowHeadings = new List<string>();
for (int Row = 1; Row <= MaxRows; Row++) {
if (ExcelData[Row, 1] != null)
RowHeadings.Add(ExcelData[Row, 1]);
}
Edit: Here is an extension method that would create a new, zero-based array from your original one (basically it just creates a new array that is one element smaller and copies to that new array all elements but the first element that you are currently skipping anyhow):
public static T[] ToZeroBasedArray<T>(this T[] array)
{
int len = array.Length - 1;
T[] newArray = new T[len];
Array.Copy(array, 1, newArray, 0, len);
return newArray;
}
That being said you need to consider if the penalty (however slight) of creating a new array is worth improving the readability of the code. I am not making a judgment (it very well may be worth it) I am just making sure you don't run with this code if it will hurt the performance of your application.
Create a wrapper for the ExcelData array with a this[,] indexer and do rebasing logic there. Something like:
class ExcelDataWrapper
{
private object[,] _excelData;
public ExcelDataWrapper(object[,] excelData)
{
_excelData = excelData;
}
public object this[int x, int y]
{
return _excelData[x+1, y+1];
}
}
Since you need Row to remain as-is (based on your comments), you could just introduce another loop variable:
List<string> RowHeadings = new List<string>();
string [,] Results = new string[MaxRows, 1]
for (int Row = 0, SrcRow = 1; SrcRow <= MaxRows; Row++, SrcRow++) {
if (ExcelData[SrcRow, 1] != null)
RowHeadings.Add(ExcelData[SrcRow, 1]);
...
...
Results[Row, 0] = ExcelData[SrcRow, 1];
}
Why not use:
for (int Row = 1; Row <= MaxRows; Row++) {
Or is there something I'm missing?
EDIT: as it turns out that something is missing, I would use another counter (starting at 0) for that purpose, and use a 1 based Row index for the array. It's not good practice to use the index for another use than the index in the target array.
Is changing the loop counter too hard for you?
for (int Row = 1; Row <= MaxRows; Row++)
If the counter's range is right, you don't have to add 1 to anything inside the loop so you don't lose readability. Keep it simple.
I agree that working with base-1 arrays from .NET can be a hassle. It is also potentially error-prone, as you have to mentally make a shift each time you use it, as well as correctly remember which situations will be base 1 and which will be base 0.
The most performant approach is to simply make these mental shifts and index appropriately, using base-1 or base-0 as required.
I personally prefer to convert the two dimensional base-1 arrays to two dimensional base-0 arrays. This, unfortunately, requires the performance hit of copying over the array to a new array, as there is no way to re-base an array in place.
Here's an extension method that can do this for the 2D arrays returned by Excel:
public static TResult[,] CloneBase0<TSource, TResult>(
this TSource[,] sourceArray)
{
If (sourceArray == null)
{
throw new ArgumentNullException(
"The 'sourceArray' is null, which is invalid.");
}
int numRows = sourceArray.GetLength(0);
int numColumns = sourceArray.GetLength(1);
TResult[,] resultArray = new TResult[numRows, numColumns];
int lb1 = sourceArray.GetLowerBound(0);
int lb2 = sourceArray.GetLowerBound(1);
for (int r = 0; r < numRows; r++)
{
for (int c = 0; c < numColumns; c++)
{
resultArray[r, c] = sourceArray[lb1 + r, lb2 + c];
}
}
return resultArray;
}
And then you can use it like this:
object[,] array2DBase1 = (object[,]) MySheet.UsedRange.get_Value(Type.Missing);
object[,] array2DBase0 = array2DBase1.CloneBase0();
for (int row = 0; row < array2DBase0.GetLength(0); row++)
{
for (int column = 0; column < array2DBase0.GetLength(1); column++)
{
// Your code goes here...
}
}
For massively sized arrays, you might not want to do this, but I find that, in general, it really cleans up your code (and mind-set) to make this conversion, and then always work in base-0.
Hope this helps...
Mike
For 1 based arrays and Excel range operations as well as UDF (SharePoint) functions I use this utility function
public static object[,] ToObjectArray(this Object Range)
{
Type type = Range.GetType();
if (type.IsArray && type.Name == "Object[,]")
{
var sourceArray = Range as Object[,];
int lb1 = sourceArray.GetLowerBound(0);
int lb2 = sourceArray.GetLowerBound(1);
if (lb1 == 0 && lb2 == 0)
{
return sourceArray;
}
else
{
int numRows = sourceArray.GetLength(0);
int numColumns = sourceArray.GetLength(1);
var resultArray = new Object[numRows, numColumns];
for (int r = 0; r < numRows; r++)
{
for (int c = 0; c < numColumns; c++)
{
resultArray[r, c] = sourceArray[lb1 + r, lb2 + c];
}
}
return resultArray;
}
}
else if (type.IsCOMObject)
{
// Get the Value2 property from the object.
Object value = type.InvokeMember("Value2",
System.Reflection.BindingFlags.Instance |
System.Reflection.BindingFlags.Public |
System.Reflection.BindingFlags.GetProperty,
null,
Range,
null);
if (value == null)
value = string.Empty;
if (value is string)
return new object[,] { { value } };
else if (value is double)
return new object[,] { { value } };
else
{
object[,] range = (object[,])value;
int rows = range.GetLength(0);
int columns = range.GetLength(1);
object[,] param = new object[rows, columns];
Array.Copy(range, param, rows * columns);
return param;
}
}
else
throw new ArgumentException("Not A Excel Range Com Object");
}
Usage
public object[,] RemoveZeros(object range)
{
return this.RemoveZeros(range.ToObjectArray());
}
[ComVisible(false)]
[UdfMethod(IsVolatile = false)]
public object[,] RemoveZeros(Object[,] range)
{...}
The first function is com visible and will accept an excel range or a chained call from another function (the chained call will return a 1 based object array), the second call is UDF enabled for Excel Services in SharePoint. All of the logic is in the second function. In this example we are just reformatting a range to replace zero with string.empty.
You could use a 3rd party Excel compatible component such as SpreadsheetGear for .NET which has .NET friendly APIs - including 0 based indexing for APIs such as IRange[int rowIndex, int colIndex].
Such components will also be much faster than the Excel API in almost all cases.
Disclaimer: I own SpreadsheetGear LLC