Read excel sheet data in columns using OpenXML - c#

Is there a way to read the excel sheet column wise rather than in rows using OpenXML-SDK & C#.
I have already tried using EPPlus package, but faced some problems because my application also uses ".xslm" files which are not supported by EPPlus. So, I need a solution in OpenXML for reading data in columns.
If anyone has a example, that will help.
Thanks
Sri

WorksheetPart worksheetPart = (WorksheetPart)document.WorkbookPart.GetPartById(sheets.First().Id);
// Get the cells in the specified column and order them by row.
IEnumerable<Cell> cells = worksheetPart.Worksheet.Descendants<Cell()
.Where(c => string.Compare(GetColumnName(c.CellReference.Value),
columnName, true) == 0).OrderBy(r => GetRowIndex(r.CellReference));
foreach (var cell in cells)
{
}

Related

How to use match function or similar with openxml

I'm creating two worksheets with openxml, I want to add a hyperlink from a cell in sheet 2 to a cell in sheet 1. I know that with Microsoft interop excel, you could use a match function to find the row number where a value is found. I was wondering how could the same task be done with openXML instead.
With Microsoft interop excel, I did something like this
var excelApp = new Application();
WorksheetFunction function = excelApp.WorksheetFunction;
var rowNumber = function.Match(currentItem.originID, sheet1.Range["B1","B" + rowCount], 0);
Where sheet1 is the sheet I'm searching through, and rowCount is how many rows there are in that sheet. This would find currentItem.originID in the range of column B and return the row number that the match was found
How could I do something similar to this but with openXML?

What is the easiest way to count .xlsx workbook sheets using c#, NPOI and an XSSF workbook?

I am trying to count the number of sheets in a workbook. The workbook is created using NPOI and there doesn't seem to be a way to count the amount of sheets using the C# version of NPOI?
This is a really tricky thing to both explain and show... But I will give it a try.
What I am trying to do is having an existing excel-file as a template for statistics. This existing excel-file can have different amounts of templates and I need to be able to count these templates to know where to place my new sheets and edit their names.
The sender of the data only has to chose which template-sheet should be filled with which data, and I will then remove the template-sheets from the workbook after all data has been inserted.
What I have tried:
I have read the documentation and searched for information and have tried the following approaches:
getNumberOfSheets - How to know number of sheets in a workbook?
Problem with this approach: The C# version of NPOI doesn't seem to have getNumberOfSheets.
Convert found row-counters into sheet-counters - NPOI - Get excel row count to check if it is empty
Can't really recreate the code to work for sheets as the functionality for sheets and rows are too different.
var sheetIndex = 0;
foreach (var sheet in requestBody.Sheets)
{
if (sheet.TemplateNumber == "")
{
sheetTemplate = templateWorkbook.CreateSheet(sheet.Name);
}
else
{
sheetTemplate = templateWorkbook.CloneSheet(Convert.ToInt32(sheet.SheetTemplate));
if (!templates.Contains(Convert.ToInt32(sheet.SheetTemplate)))
{
templates.Add(Convert.ToInt32(sheet.SheetTemplate));
}
// Do math's to make sure we add the name to the newly created sheet further down the code (I need to actual index here)
}
// Insert statistics
//After inserting statistics:
workingCopy.SetSheetName(sheetIndex, sheet.Name);
foreach (var template in templates)
{
workingCopy.RemoveSheetAt(template);
}
}
You can get number of sheets from NumberOfSheets property in XSSFWorkbook class.

Colorize Entire Row of Cells Having a Text/Value on a Column in Excel using EPPlus

I need to format entire row of cells having a value on a column using EPPlus.
For example, colorize rows having text of "yes" on its 'H' column.
In order to achieve this I used excel conditional formatting rules(EPPlus) but I could only format cells, not entire row. How can I accomplish this?
Given that worksheet is an ExcelWorksheet and rowNumber is... that:
var rangeAddress = $"{rowNumber}:{rowNumber}";
var expression = worksheet.ConditionalFormatting
.AddExpression(new ExcelAddress(rangeAddress));
expression.Style.NumberFormat.Format = "0.00";
// not trying to be lazy here - you already have your formula.
expression.Formula = "IF(something)";

How to add a pivot table filter field using EPPlus and C#

I'm creating an Excel file using the EPPlus library and C#.
In this Excel file there is a sheet with pivot table to which I'd like to add a filter field like in this manually created Excel sheet (Country):
How can I do this using EPPlus?
I did not see your code, but you can do something like this.
var ws = excelPackage.Workbook.Worksheets["index of your worksheet"];
var pivotTable = ws.PivotTables[0]; //assuming that you only have 1
var pivotField = pivotTable.Fields["Country"];
pivotTable.PageFields.Add(pivotField); // should add field into desired place
Hope it´s a little bit helpful. Don´t forget to save your excel file at the end.

OpenXML does not help to read large Excel files contrary to documentation

The documentation says that:
The following code segment is used to read a very large Excel
file using the DOM approach.
and then goes an example. I use it to implement reading a relatively large file with 700K rows. I have this code by now:
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(path, false))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
// no other code
}
When I start my program, I see how quickly - just in five seconds - it runs out of memory (>1G). And the debugger points to this line of code:
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
So, I need to know whether OpenXML really helps to read large files. And, if not, what are the alternatives (Interop does not help - I've already checked it).
EDIT
One extra mysterious thing. This code I get by now:
OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
count++;
}
}
gives me in the count variable over than a million of rows. However, I do have 14K on the first sheet and 700K on the second sheet. It is very strange. So, my extra question is how to parse only rows with data using SAX approach. And one final mystery of reading large Excel files on OpenXML. One guy in this thread says that: "Turns out that the worksheets are enumerated backwards for some reason (so the first of my three sheets is actually index 3". So, my final extra question is how to get the sheet you want. At this moment I use this code:
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
But taking into account what the say, I'm not sure that in my case I would actually get the first worksheet.
You seem to have a few questions, I'll try and tackle them one-by-one.
So, I need to know whether OpenXML really helps to read large files. And, if not, what are the alternatives (Interop does not help - I've already checked it).
Yes, the OpenXml SDK is great for reading large files but you may need to use a SAX approach rather than a DOM approach. From the same documentation you cite:
However, the DOM approach requires loading entire Open XML parts into memory, which can cause an Out of Memory exception when you are working with really large files.... Consider using SAX when you need to handle very large files.
The DOM approach loads the whole sheet into memory which for a large sheet can cause out of memory exceptions. Using the SAX approach you read each element in turn which reduces the memory consumption considerably.
So, my extra question is how to parse only rows with data using SAX approach
You are only getting the rows that have data (or at least the rows that exist in the XML) using the SDK. You appear to have asked this as a separate question which I've answered in more detail but essentially you are seeing the start and end of each row element using the code in your question. See my answer to your Why does OpenXML read rows twice question for more details.
So, my final extra question is how to get the sheet you want.
You need to find the Sheet by name which is a descendant of the Workbook. Once you have that you can use its Id to get the WorksheetPart:
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(filename, false))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
Sheet sheet = workbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == sheetName).First();
if (sheet != null)
{
WorksheetPart worksheetPart = workbookPart.GetPartById(sheet.Id) as WorksheetPart;
//read worksheetPart...
}
}

Categories