Skip columns while reading big excel file using Open Xml Sax approach

Skip columns while reading big excel file using Open Xml Sax approach - c#

I am reading several medium sized excel files, approximately 50 columns x 500 rows. The problem is that some formatting is dragged until column XFD, that is column number 16384=2^16/4 if my math is correct. With OleDb this does not cause any problems as the following query let me select only a subset of the total spreadsheet without huge performance loss caused by the remaining columns
SELECT * FROM [SheetNameA1:BC500]
This takes around 3 seconds. The problem with OleDb is that it requires windows and a file at a disk, both of these causes some trouble with our cloud infrastructure and we would like to use OpenXml instead. OpenXml can be used with DOM-approach or SAX-approach. The first is a show stopper as a call to Worksheet getter at WorksheetPart loads the whole xml with all columns taking around 10 seconds.
Using the SAX approach to navigate the XML gives me the 5 methods for navigating a OpenXmlReader: LoadCurrentElement, Read, ReadFirstChild, ReadNextSibling and Skip. Using these I can:
use Read until I am hitting the Row elements
use ReadFirstChild to hit first Cell element and ReadNextSibling to read remaining and load them using LoadCurrentElement until column BC
use ReadNextSibling until the whole Row is read (ignoring content, i.e. no call to LoadCurrentElement)
The performance loss is in the last step. How can I make the reader jump to the next row without looping through all the cells.
I think the key might be to use Skip to loop over all children. The problem is that I need to be at Row-element to skip all Cell elements and there is no way to "rewind".
Here is an example I made to illustrate the problem. The excel file is simply marked with x in the range A1:XFD500. And here are the messures from while-time and load-time:
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;
using System.Diagnostics;
using System.Text.RegularExpressions;
using (var file = File.Open("testfile.xlsx", FileMode.Open, FileAccess.Read))
{
var doc = SpreadsheetDocument.Open(file, false);
var workbookPart = doc.WorkbookPart;
var sheet = doc
.WorkbookPart
.Workbook
.Descendants<Sheet>()
.First(s => s.Name == "sheetName");
var worksheetPart = (WorksheetPart)doc.WorkbookPart.GetPartById(sheet.Id);
OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
Dictionary<int, string> sharedStringCache = new Dictionary<int, string>();
var rows = new List<List<object>>();
int i = 0;
foreach (var el in workbookPart.SharedStringTablePart.SharedStringTable.ChildElements)
{
sharedStringCache.Add(i++, el.InnerText);
}
TimeSpan whileTime = TimeSpan.Zero;
TimeSpan loadTime = TimeSpan.Zero;
var stopwatch1 = new Stopwatch();
var stopwatch2 = new Stopwatch();
int lastColumnWithData = 50;
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
reader.ReadFirstChild();
List<object> cells = new List<object>();
do
{
if (reader.ElementType == typeof(Cell))
{
stopwatch2.Restart();
Cell c = (Cell)reader.LoadCurrentElement();
loadTime += stopwatch2.Elapsed;
var columnLetters = Regex.Replace(c.CellReference, #"[\d]", string.Empty).ToUpper();
var columnIndex = NumberFromExcelColumn(columnLetters);
var rowIndex = int.Parse(Regex.Replace(c.CellReference, #"[^\d]", string.Empty).ToUpper());
if (columnIndex > lastColumnWithData)
{
stopwatch1.Restart();
while (reader.ReadNextSibling()) {}
whileTime += stopwatch1.Elapsed;
break;
}
object value;
switch (c.DataType?.Value)
{
case CellValues.Boolean:
value = bool.Parse(c.CellValue.InnerText);
break;
case CellValues.Date:
value = DateTime.Parse(c.CellValue.InnerText);
break;
case CellValues.Number:
value = double.Parse(c.CellValue.InnerText);
break;
case CellValues.InlineString:
case CellValues.String:
value = c.CellValue.InnerText;
break;
case CellValues.SharedString:
value = sharedStringCache[int.Parse(c.CellValue.InnerText)];
break;
default:
value = c.CellValue.InnerText;
break;
}
if (value != null)
cells.Add(value);
}
} while (reader.ReadNextSibling());
if (cells.Any())
rows.Add(cells);
}
}
}
static int NumberFromExcelColumn(string column)
{
int retVal = 0;
string col = column.ToUpper();
for (int iChar = col.Length - 1; iChar >= 0; iChar--)
{
char colPiece = col[iChar];
int colNum = colPiece - 64;
retVal = retVal + colNum * (int)Math.Pow(26, col.Length - (iChar + 1));
}
return retVal;
}
made using examples from:
How to open a huge excel file efficiently
Fastest function to generate Excel column letters in C#

The Skip() function will skip the child elements of the current node.
If the current loaded element is the parent Row this can be used.
For the example above it would be simpler break out of the do while loop when the count is reached. Using break will skip any remaining siblings and move to the next element that is a typeof row.
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
reader.ReadFirstChild();
int lastColumnWithData = 50;
int columnCount = 0;
do
{
if (reader.ElementType == typeof(Cell))
{
columnCount += 1;
}
} while (reader.ReadNextSibling() && columnCount <= lastColumnWithData);
}
}
or to move to the last child element before exiting iteration, try the following.
do
{
if (reader.ElementType == typeof(Cell))
{
columnCount += 1;
if (columnCount > lastColumnWithData)
{
// skip to last element
while (reader.ReadNextSibling() && !reader.IsEndElement)
{
};
break;
}
}
} while (reader.ReadNextSibling());

Related

CSV Helper: Parsing null vs empty cells

I am using CSVHelper to parse the CSV file.
I am having some issues to identify when a null cell value or a cell with some value (ie one or more spaces).
Issue is when the user add just one space in the file in its cell and uploads the file, the CSV helper trims that cell value so that value is passed as "".
Now when the user doesnt add anything(or types) to the cell this is also passed as like "".
So what I want is:
- Nulls should not be allowed to be uploaded.
- One or more spaces in a cell is allowed.
How can I achieve this using CSVHelper. Below is my sample code:
using (TextReader fileReader = new StreamReader(file.OpenReadStream()))
{
var configuration = new Configuration
{
HasHeaderRecord = parameter.HasHeader,
Delimiter = parameter.Delimiter.ToString(),
Quote = parameter.Quote
};
using (var csv = new CsvReader(fileReader, configuration))
{
for (int rowIndex = 0; await csv.ReadAsync(); rowIndex++)
{
var record = csv.GetRecord<dynamic>() as IDictionary<string, object>;
string[] row = record?.Select(i => i.Value as string).ToArray();
for (int i = 0; i < row.Length; i++)
{
//process rows
}
}
}
}
Below is the csv example:
"1"," ","1"
"2","0"," "
"3","","1"
In the above csv first row has second column with one space which should be allowed
Third row has 2nd column with null which should not be allowed.
Anything in my code which is missing or any workaround to handle this?
Thanks

CsvHelper 15.0.3
With the following code I show spaces where there are spaces and empty where it is empty.
Maybe there is something else going on?
static void Main(string[] args)
{
ProcessRecords();
}
static async void ProcessRecords()
{
using (var reader = new StringReader("\"1\",\" \",\"1\"\n\"2\",\"0\",\" \"\n\"3\",\"\",\"1\""))
{
var configuration = new CsvHelper.Configuration.CsvConfiguration(CultureInfo.InvariantCulture)
{
HasHeaderRecord = false,
Delimiter = ",",
Quote = '"'
};
using (var csv = new CsvReader(reader, configuration))
{
for (int rowIndex = 0; await csv.ReadAsync(); rowIndex++)
{
Console.WriteLine($"Row: {rowIndex}");
var record = csv.GetRecord<dynamic>() as IDictionary<string, object>;
string[] row = record?.Select(i => i.Value as string).ToArray();
for (int i = 0; i < row.Length; i++)
{
if (row[i] == " ")
{
Console.WriteLine("Has a space");
}
if (row[i] == "")
{
Console.WriteLine("Empty value");
}
}
}
Console.ReadKey();
}
}
}

C# Find empty cells and write them inside with ClosedXml

I have this problem, I have installed ClosedXml:
I have an Excel file already created and populated, now I should find the blank line below the already populated one and write some data
Example:
[A, 1] = name;
[B, 1] = surname;
the next line will be empty and I will pass some variables to populate the cells going to the right.
OpenFileDialog FileExcel = new OpenFileDialog();
if (FileExcel.ShowDialog() == DialogResult.OK)
{
try
{
var sr = new StreamReader(FileExcel.FileName);
}
catch (SecurityException ex)
{
MessageBox.Show($"Security error.\n\nError message: {ex.Message}\n\n" +
$"Details:\n\n{ex.StackTrace}");
}
}
using (var excelWorkbook = new XLWorkbook(FileExcel.FileName))
{
var nonEmptyDataRows = excelWorkbook.Worksheet(Convert.ToInt32(comboBox1.SelectedItem)).RowsUsed();
foreach (var dataRow in nonEmptyDataRows)
{
//for row number check
if (dataRow.RowNumber() >= 1 && dataRow.RowNumber() <= 100)
{
}
}
}

Use row.Cells(false) instead of row.Cells(). It does not skip over unused cells. Then you can simply check of cell.Value() is empty

you can do something like that
int lastrow = worksheet.LastRowUsed().RowNumber();
var rows = worksheet.Rows(1, lastrow);
foreach (IXLRow row in rows)
{
foreach (IXLCell cell in row.Cells())
{
if (cell.IsEmpty())
{
//do something
}
}
}

How retrieve each specific column's values by looping through rows using C# from excel?

I am editing uploaded excel workbooks using C# with the same logic I used to do using VBA. I am using SyncFusion to open the workbooks but however, the code below is not letting me read the whole column to apply the logic. Why?
public void AppendID(string excelFilePath, HttpResponse response)
{
using (ExcelEngine excelEngine = new ExcelEngine())
{
IApplication application = excelEngine.Excel;
application.DefaultVersion = ExcelVersion.Excel2007;
IWorkbook workbook = application.Workbooks.Open(excelFilePath);
workbook.Version = ExcelVersion.Excel97to2003;
workbook.Allow3DRangesInDataValidation = true;
//Accessing worksheet via name
IWorksheet worksheet = workbook.Worksheets[2];
When I try to define the range, the error will appear "Two names not allowed".
var prismaID = worksheet.UsedRange["C15:C"].Value;
var type = worksheet.UsedRange["F15:F"].Value;
var placements = worksheet.UsedRange["I15:I"].Value;
if (!type.Contains("PKG"))
{
placements = placements + prismaID;
}
worksheet.Range["G7"].Text = "Testing";
workbook.SaveAs(excelFilePath);
workbook.Close();
}
}
Logic:
Let's say I have three columns and how to use the following logic to manipulate usedRange cells?
ID Condition Name Output
1 Yes Sarah Sarah(1)
2 No George George
3 Yes John(3) John(3)
The logics to apply:
Move the first column 'ID' to the end of the column 'Name' but
if Column 'Condition' contains 'No'then don't move the first column
or if it contains the same 'ID' already.
Here is the VBA code:
With xlSheet
LastRow = xlSheet.UsedRange.Rows.Count
Set target = .Range(.Cells(15, 9), .Cells(LastRow, 9))
values = target.Value
Set ptype=.Range(.Cells(15,6),.Cells(LastRow,6))
pvalues=ptype.Value
For i = LBound(values, 1) To UBound(values, 1)
'if Statement for test keywords
If InStr(1,pvalues(i,1),"Package")= 0 AND InStr(1,pvalues(i,1),"Roadblock")= 0 Then
If Instr(values(I,1),.Cells(i + 15 - LBound(values, 1), 3)) = 0 Then
'If InStr(1,values(i,1),"(")=0 Then
values(i, 1) = values(i, 1) & "(" & .Cells(i + 15 - LBound(values, 1), 3) & ")"
End If
End If
Next
target.Value = values
End With

Your requirement can be achieved by appending column ID with column Name using XlsIO.
Please refer below code snippet for the same.
Code Snippet:
for(int row = 1; row<= worksheet.Columns[1].Count; row++)
{
if (worksheet[row, 2].Value == "yes" && !worksheet[row, 3].Value.EndsWith(")"))
worksheet[row, 4].Value = worksheet[row, 3].Value + "(" + worksheet[row, 1].Value + ")";
else
worksheet[row, 4].Value = worksheet[row, 3].Value;
}
We have prepared simple sample and the sample can be downloaded from the following link.
Sample Link: http://www.syncfusion.com/downloads/support/directtrac/general/ze/Sample859524528.zip
I work for Syncfusion.

So I am working with templates in excel, and I developed this logic.
I create a coupling of the first row of column names and the rows using the first cell as the key to bind the data in groups to a multi value dictionary.
I use the below function, which can be adapted to skip rows before parsing allowing you to target the proper row for binding. Book is ExcelDataReader.AsDataSet()
public static MultiValueDictionary<string, ILookup<string, string>> ParseTemplate(string Sheet, ref List<string> keys)
{
int xskip = 0;
MultiValueDictionary<string, ILookup<string, string>> mvd = new MultiValueDictionary<string, ILookup<string, string>>();
var sheetRows = Book.Tables[Sheet];
//Parse First row
var FirstRow = sheetRows.Rows[0];
for (var Columns = 0; Columns < sheetRows.Columns.Count; Columns++)
{
if (xskip == 0)
{
xskip = 1;
continue;
}
keys.Add(FirstRow[Columns].ToString());
}
//Skip First Row
xskip = 0;
//Create a binding of first row and all subsequent rows
foreach (var row in sheetRows.Select().Skip(1))
{
//Make the key the first cell of each row
var key = row[0];
List<string> rows = new List<string>();
foreach (var item in row.ItemArray)
{
if (xskip == 0)
{
xskip = 1;
continue;
}
rows.Add(item.ToString());
}
mvd.Add(key.ToString(), keys.Zip(rows, (m, n) => new { Key = m, Value = n }).ToLookup(x => x.Key, y => y.Value));
xskip = 0;
}
return mvd;
}
}
//This is example of what a function to parse this could do.
foreach(var Key in mvd.Keys)
{
var KeywithValues = mvd[Key];
foreach(ColumnName in Keys)
{
KeywithValues[ColumnName].
}
}
Hope it helps.

Reading a full table from Excel using Open XML ...FASTER

Warning: long post due to examples and results
There have been threads here about how to read an Open XML spreadsheet row with null cells in between columns. I drew some of my answers from here reading Excel Open XML is ignoring blank cells
I'm able to read a table just fine from xlsx, but it is 10 times slower than reading from CSV, while the open XML structure should(?) yield superior results.
Here's what I got for testing code base:
foreach (Row r in sheetData.Descendants<Row>())
{
sw.Start();
//find a row marked as "header" and get list of columns that define width of table
if (!headerRowFound)
{
headerRowFound = CheckOXMLHeaderRow(r, workbookPart, out headerReferences);
if (!headerRowFound)
continue;
}
rowKey++;
//////////////////////////////////////////////////////////////////////
///////////////////here we are going to do work//////////////////////
////////////////////////////////////////////////////////////////////
AddRow(rowKey, cols);
sw.Stop();
Debug.WriteLine("XLSX Row added in \t" + sw.ElapsedTicks.ToString() + "\tticks");
sw.Reset();
}
In my data a row is 68 cells, with only 5-10 of them filled out
0. For comparison, going through CSV rows takes about 300 ticks (lightning fast). 5000 rows adds in 3ms
1. Code as is processes through row enumerators only in 1-4 ticks
2. This code simply grabs all cells sequentially and stores them in a row (column order is screwed up due to OXML nature)
Hashtable cols = new Hashtable();
foreach (Cell c in r.Descendants<Cell>())
{
colKey++;
cols.Add(colKey, c);
}
//this takes about 8-10 times longer - 10-30 ticks , still lightning fast
3. If we know where to look for based on column(header) name and row number, we can do this
Hashtable cols = new Hashtable();
foreach (string column in headerReferences.Values)
{
colKey++;
cols.Add(colKey, GetCellValue(workbookPart, worksheetPart, column + r.RowIndex.ToString()));
}
This is one of the MSDN examples and it's whooping 500,000 ticks per row. Took several minutes to parse a 5,000 row spreadsheet. Not acceptable.
Here were were targeting EVERY cell in a row, existing or not
4. I decided to scale back and try to retrieve value from all incoming cells out of order into HashTable
Hashtable cols = new Hashtable();
foreach (Cell c in r.Descendants<Cell>())
{
colKey++;
cols.Add(colKey, GetValueFromCell(c, workbookPart));
}
This is now 500-1,500 ticks per row. Still, lightning fast if we just store the values without any order (not a solution yet)
5. To make sure i preserve the order of columns I make an empty clone of header row for every new row and after i parse through EXISTING cells, i decide where to put them based on Hashtable retrieval
Hashtable cols = (Hashtable)emptyNewRow.Clone();
foreach (Cell c in r.Descendants<Cell>())
{
colKey = headerReferences[GetColumnName(c.CellReference)]; //what # column is this?
cols[colKey] = GetValueFromCell(c, workbookPart); //put value in that column
}
Final result is 9,000-20,000 ticks per row. 30s for 5,000 spreadsheet. Doable, but not ideal.
Here's where I stopped. Any ideas how to make it faster? How can humongous xlsx spreadsheets load so lightning fast and best i can do here is 30s for 5k rows??
Dictionaries didn't do anything for me, not even 1% improvement. And I need result in Hashtables anyway for legacy retrofit
Appendix: referenced methods
public static string GetColumnName(string cellReference)
{
// Match the column name portion of the cell name.
Regex regex = new Regex("[A-Za-z]+");
Match match = regex.Match(cellReference);
return match.Value;
}
public static string GetValueFromCell(Cell cell, WorkbookPart workbookPart)
{
int id;
string cellValue = cell.InnerText;
if (cellValue.Trim().Length > 0)
{
if (cell.DataType != null)
{
switch (cell.DataType.Value)
{
case CellValues.SharedString:
Int32.TryParse(cellValue, out id);
SharedStringItem item = GetSharedStringItemById(workbookPart, id);
if (item.Text != null)
{
cellValue = item.Text.Text;
}
else if (item.InnerText != null)
{
cellValue = item.InnerText;
}
else if (item.InnerXml != null)
{
cellValue = item.InnerXml;
}
break;
case CellValues.Boolean:
switch (cellValue)
{
case "0":
cellValue = "FALSE";
break;
default:
cellValue = "TRUE";
break;
}
break;
}
}
else
{
int excelDate;
if (Int32.TryParse(cellValue, out excelDate))
{
var styleIndex = (int)cell.StyleIndex.Value;
var cellFormats = workbookPart.WorkbookStylesPart.Stylesheet.CellFormats;
var numberingFormats = workbookPart.WorkbookStylesPart.Stylesheet.NumberingFormats;
var cellFormat = (CellFormat)cellFormats.ElementAt(styleIndex);
if (cellFormat.NumberFormatId != null)
{
var numberFormatId = cellFormat.NumberFormatId.Value;
var numberingFormat = numberingFormats.Cast<NumberingFormat>().SingleOrDefault(f => f.NumberFormatId.Value == numberFormatId);
if (numberingFormat != null && numberingFormat.FormatCode.Value.Contains("/yy")) //TODO here i should think of locales
{
DateTime dt = DateTime.FromOADate(excelDate);
cellValue = dt.ToString("MM/dd/yyyy");
}
}
}
}
}
return cellValue;
}
public static string GetCellValue(WorkbookPart wbPart, WorksheetPart wsPart, string addressName)
{
string value = String.Empty; //code from microsoft prefers null, but null is tough to work with
// Use its Worksheet property to get a reference to the cell
// whose address matches the address you supplied.
Cell theCell = wsPart.Worksheet.Descendants<Cell>().
Where(c => c.CellReference == addressName).FirstOrDefault();
// If the cell does not exist, return an empty string.
if (theCell != null)
{
value = theCell.InnerText;
// If the cell represents an integer number, you are done.
// For dates, this code returns the serialized value that
// represents the date. The code handles strings and
// Booleans individually. For shared strings, the code
// looks up the corresponding value in the shared string
// table. For Booleans, the code converts the value into
// the words TRUE or FALSE.
if (theCell.DataType != null)
{
switch (theCell.DataType.Value)
{
case CellValues.SharedString:
// For shared strings, look up the value in the shared strings table.
var stringTable = wbPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
// If the shared string table is missing, something is wrong. Return the index that is in the cell.
//Otherwise, look up the correct text in the table.
if (stringTable != null)
{
value = stringTable.SharedStringTable.ElementAt(int.Parse(value)).InnerText;
}
break;
case CellValues.Boolean:
switch (value)
{
case "0":
value = "FALSE";
break;
default:
value = "TRUE";
break;
}
break;
}
}
}
return value;
}

reading a large open xml spreadsheet

i need to read (and parse) large spreadsheet files (20-50MB) using the openxml libraries and there doesn't seem to be a way to stream the rows one at a time for parsing.
i'm consistently getting Out Of Memory exceptions as it seems as soon as i attempt to access a row (or iterate) the entire row contents are loaded (100K+ rows).
each of the calls, whether Elements.Where( with query )
or Descendants ( ) seem to load the entire rowset
is there a way to stream or just read a row at a time ?
thx

i found an answer. if you use the OpenXmlReader on the worksheet part you can iterate through and effectively lazy load the elements you come across.
OpenXmlReader oxr = OpenXmlReader.Create(worksheetPart);
look for
ElementType == typeof(SheetData)
and load the row (lazy)
Row row = (Row)oxr.LoadCurrentElement();

do the openxml libraries use dom or sax models? with dom you usually have to hold the entire document in memory at once, but with sax you can stream the events as they come.

Here is the code to read large excel file with multiple sheets using SAX approach:
public static DataTable ReadIntoDatatableFromExcel(string newFilePath)
{
/*Creating a table with 20 columns*/
var dt = CreateProviderRvenueSharingTable();
try
{
/*using stream so that if excel file is in another process then it can read without error*/
using (Stream stream = new FileStream(newFilePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(stream, false))
{
var workbookPart = spreadsheetDocument.WorkbookPart;
var workbook = workbookPart.Workbook;
/*get only unhide tabs*/
var sheets = workbook.Descendants<Sheet>().Where(e => e.State == null);
foreach (var sheet in sheets)
{
var worksheetPart = (WorksheetPart)workbookPart.GetPartById(sheet.Id);
/*Remove empty sheets*/
List<Row> rows = worksheetPart.Worksheet.Elements<SheetData>().First().Elements<Row>()
.Where(r => r.InnerText != string.Empty).ToList();
if (rows.Count > 1)
{
OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
int i = 0;
int BTR = 0;/*Break the reader while empty rows are found*/
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
/*ignoring first row with headers and check if data is there after header*/
if (i < 2)
{
i++;
continue;
}
reader.ReadFirstChild();
DataRow row = dt.NewRow();
int CN = 0;
if (reader.ElementType == typeof(Cell))
{
do
{
Cell c = (Cell)reader.LoadCurrentElement();
/*reader skipping blank cells so data is getting worng in datatable's rows according to header*/
if (CN != 0)
{
int cellColumnIndex =
ExcelHelper.GetColumnIndexFromName(
ExcelHelper.GetColumnName(c.CellReference));
if (cellColumnIndex < 20 && CN < cellColumnIndex - 1)
{
do
{
row[CN] = string.Empty;
CN++;
} while (CN < cellColumnIndex - 1);
}
}
/*stopping execution if first cell does not have any value which means empty row*/
if (CN == 0 && c.DataType == null && c.CellValue == null)
{
BTR++;
break;
}
string cellValue = GetCellValue(c, workbookPart);
row[CN] = cellValue;
CN++;
/*if any text exists after T column (index 20) then skip the reader*/
if (CN == 20)
{
break;
}
} while (reader.ReadNextSibling());
}
/*reader skipping blank cells so fill the array upto 19 index*/
while (CN != 0 && CN < 20)
{
row[CN] = string.Empty;
CN++;
}
if (CN == 20)
{
dt.Rows.Add(row);
}
}
/*escaping empty rows below data filled rows after checking 5 times */
if (BTR > 5)
break;
}
reader.Close();
}
}
}
}
}
catch (Exception ex)
{
throw ex;
}
return dt;
}
private static string GetCellValue(Cell c, WorkbookPart workbookPart)
{
string cellValue = string.Empty;
if (c.DataType != null && c.DataType == CellValues.SharedString)
{
SharedStringItem ssi =
workbookPart.SharedStringTablePart.SharedStringTable
.Elements<SharedStringItem>()
.ElementAt(int.Parse(c.CellValue.InnerText));
if (ssi.Text != null)
{
cellValue = ssi.Text.Text;
}
}
else
{
if (c.CellValue != null)
{
cellValue = c.CellValue.InnerText;
}
}
return cellValue;
}
public static int GetColumnIndexFromName(string columnNameOrCellReference)
{
int columnIndex = 0;
int factor = 1;
for (int pos = columnNameOrCellReference.Length - 1; pos >= 0; pos--) // R to L
{
if (Char.IsLetter(columnNameOrCellReference[pos])) // for letters (columnName)
{
columnIndex += factor * ((columnNameOrCellReference[pos] - 'A') + 1);
factor *= 26;
}
}
return columnIndex;
}
public static string GetColumnName(string cellReference)
{
/* Advance from L to R until a number, then return 0 through previous position*/
for (int lastCharPos = 0; lastCharPos <= 3; lastCharPos++)
if (Char.IsNumber(cellReference[lastCharPos]))
return cellReference.Substring(0, lastCharPos);
throw new ArgumentOutOfRangeException("cellReference");
}
private static DataTable CreateProviderRvenueSharingTable()
{
DataTable dt = new DataTable("RevenueSharingTransaction");
// Create fields
dt.Columns.Add("IMId", typeof(string));
dt.Columns.Add("InternalPlanId", typeof(string));
dt.Columns.Add("PaymentReceivedDate", typeof(string));
dt.Columns.Add("PaymentAmount", typeof(string));
dt.Columns.Add("BPS", typeof(string));
dt.Columns.Add("Asset", typeof(string));
dt.Columns.Add("PaymentType", typeof(string));
dt.Columns.Add("InvestmentManager", typeof(string));
dt.Columns.Add("Frequency", typeof(string));
dt.Columns.Add("StartDateForPayment", typeof(string));
dt.Columns.Add("EndDateForPayment", typeof(string));
dt.Columns.Add("Participant", typeof(string));
dt.Columns.Add("SSN", typeof(string));
dt.Columns.Add("JEDate", typeof(string));
dt.Columns.Add("GL", typeof(string));
dt.Columns.Add("JEDescription", typeof(string));
dt.Columns.Add("CRAccount", typeof(string));
dt.Columns.Add("ReportName", typeof(string));
dt.Columns.Add("ReportLocation", typeof(string));
dt.Columns.Add("Division", typeof(string));
return dt;
}
Code works for:
1. read the sheets from first in ascending order
2. if excel file is being used by another process, OpenXML still reads that.
3. This code reads blank cells
4. skip empty rows after reading complete.
5. it reads 5000 rows within 4 seconds.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Skip columns while reading big excel file using Open Xml Sax approach - c#

Related

CSV Helper: Parsing null vs empty cells

C# Find empty cells and write them inside with ClosedXml

How retrieve each specific column's values by looping through rows using C# from excel?

Reading a full table from Excel using Open XML ...FASTER

reading a large open xml spreadsheet

Categories

Resources